{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "RqLQ-hbyb5jH"
   },
   "source": [
    "# Fraud Detection in Electricity and Gas Consumption Challenge\n",
    "\n",
    "This is a simple starter notebook based on the tutorial prepared by Joy Wawira. Check out the article [here](https://zindi.africa/learn/fraud-detection-in-electricity-and-gas-consumption-challenge-tutorial) for a more detailed description of the steps taken.\n",
    "\n",
    "This notebook covers:\n",
    "- Downloading the data straight from Zindi and onto colab\n",
    "- Loading the data and carrying out simple EDA to understand the data and prepare for modelling \n",
    "- Preprocessing the data and feature engineering \n",
    "- Creating a simple LGBM model and predicting on the test set\n",
    "- Prepare submission file and save as csv\n",
    "- Some tips on how to improve model performance and your score"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "e6LfbE6PJbmp"
   },
   "source": [
    "# Pre-Requisites"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "GKfpS4UVJ4Wj"
   },
   "source": [
    "## Import Libraries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "id": "OJMD_I3EJWP4"
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "C:\\Users\\MSI GF 63\\miniconda3\\envs\\tf\\lib\\site-packages\\cupy\\_environment.py:213: UserWarning: CUDA path could not be detected. Set CUDA_PATH environment variable if CuPy fails to load.\n",
      "  warnings.warn(\n"
     ]
    }
   ],
   "source": [
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "import lightgbm\n",
    "from lightgbm import LGBMClassifier\n",
    "\n",
    "import warnings\n",
    "warnings.simplefilter('ignore')\n",
    "pd.set_option('display.max_columns', None)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "exEGyf6vJ_4T"
   },
   "source": [
    "## Read the Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "id": "I2oIx7IuKCJm"
   },
   "outputs": [],
   "source": [
    "client_train=pd.read_csv(\"C:/Users/MSI GF 63/Desktop/Steg/train/client_train.csv\", low_memory=False)\n",
    "invoice_train=pd.read_csv(\"C:/Users/MSI GF 63/Desktop/Steg/train/invoice_train.csv\", low_memory=False)\n",
    "\n",
    "client_test=pd.read_csv(\"C:/Users/MSI GF 63/Desktop/Steg/test/client_test.csv\", low_memory=False)\n",
    "invoice_test=pd.read_csv(\"C:/Users/MSI GF 63/Desktop/Steg/test/invoice_test.csv\", low_memory=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "cDRp9i5ZO9mz"
   },
   "source": [
    "## Data Understanding"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "h4zQzvLjKZ6E",
    "outputId": "deb14c51-2f5d-40cd-aa35-771477e044e0"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(135493, 6) (4476749, 16) (58069, 5) (4476749, 16)\n"
     ]
    }
   ],
   "source": [
    "#compare size of the various datasets\n",
    "print(client_train.shape, invoice_train.shape, client_test.shape, invoice_train.shape)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 270
    },
    "id": "Tw1On2leQsxU",
    "outputId": "4bbe6376-2007-4ea4-cd6f-e53e4c0bcb33"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>client_id</th>\n",
       "      <th>invoice_date</th>\n",
       "      <th>tarif_type</th>\n",
       "      <th>counter_number</th>\n",
       "      <th>counter_statue</th>\n",
       "      <th>counter_code</th>\n",
       "      <th>reading_remarque</th>\n",
       "      <th>counter_coefficient</th>\n",
       "      <th>consommation_level_1</th>\n",
       "      <th>consommation_level_2</th>\n",
       "      <th>consommation_level_3</th>\n",
       "      <th>consommation_level_4</th>\n",
       "      <th>old_index</th>\n",
       "      <th>new_index</th>\n",
       "      <th>months_number</th>\n",
       "      <th>counter_type</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>train_Client_0</td>\n",
       "      <td>2014-03-24</td>\n",
       "      <td>11</td>\n",
       "      <td>1335667</td>\n",
       "      <td>0</td>\n",
       "      <td>203</td>\n",
       "      <td>8</td>\n",
       "      <td>1</td>\n",
       "      <td>82</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>14302</td>\n",
       "      <td>14384</td>\n",
       "      <td>4</td>\n",
       "      <td>ELEC</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>train_Client_0</td>\n",
       "      <td>2013-03-29</td>\n",
       "      <td>11</td>\n",
       "      <td>1335667</td>\n",
       "      <td>0</td>\n",
       "      <td>203</td>\n",
       "      <td>6</td>\n",
       "      <td>1</td>\n",
       "      <td>1200</td>\n",
       "      <td>184</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>12294</td>\n",
       "      <td>13678</td>\n",
       "      <td>4</td>\n",
       "      <td>ELEC</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>train_Client_0</td>\n",
       "      <td>2015-03-23</td>\n",
       "      <td>11</td>\n",
       "      <td>1335667</td>\n",
       "      <td>0</td>\n",
       "      <td>203</td>\n",
       "      <td>8</td>\n",
       "      <td>1</td>\n",
       "      <td>123</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>14624</td>\n",
       "      <td>14747</td>\n",
       "      <td>4</td>\n",
       "      <td>ELEC</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>train_Client_0</td>\n",
       "      <td>2015-07-13</td>\n",
       "      <td>11</td>\n",
       "      <td>1335667</td>\n",
       "      <td>0</td>\n",
       "      <td>207</td>\n",
       "      <td>8</td>\n",
       "      <td>1</td>\n",
       "      <td>102</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>14747</td>\n",
       "      <td>14849</td>\n",
       "      <td>4</td>\n",
       "      <td>ELEC</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>train_Client_0</td>\n",
       "      <td>2016-11-17</td>\n",
       "      <td>11</td>\n",
       "      <td>1335667</td>\n",
       "      <td>0</td>\n",
       "      <td>207</td>\n",
       "      <td>9</td>\n",
       "      <td>1</td>\n",
       "      <td>572</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>15066</td>\n",
       "      <td>15638</td>\n",
       "      <td>12</td>\n",
       "      <td>ELEC</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        client_id invoice_date  tarif_type  counter_number counter_statue  \\\n",
       "0  train_Client_0   2014-03-24          11         1335667              0   \n",
       "1  train_Client_0   2013-03-29          11         1335667              0   \n",
       "2  train_Client_0   2015-03-23          11         1335667              0   \n",
       "3  train_Client_0   2015-07-13          11         1335667              0   \n",
       "4  train_Client_0   2016-11-17          11         1335667              0   \n",
       "\n",
       "   counter_code  reading_remarque  counter_coefficient  consommation_level_1  \\\n",
       "0           203                 8                    1                    82   \n",
       "1           203                 6                    1                  1200   \n",
       "2           203                 8                    1                   123   \n",
       "3           207                 8                    1                   102   \n",
       "4           207                 9                    1                   572   \n",
       "\n",
       "   consommation_level_2  consommation_level_3  consommation_level_4  \\\n",
       "0                     0                     0                     0   \n",
       "1                   184                     0                     0   \n",
       "2                     0                     0                     0   \n",
       "3                     0                     0                     0   \n",
       "4                     0                     0                     0   \n",
       "\n",
       "   old_index  new_index  months_number counter_type  \n",
       "0      14302      14384              4         ELEC  \n",
       "1      12294      13678              4         ELEC  \n",
       "2      14624      14747              4         ELEC  \n",
       "3      14747      14849              4         ELEC  \n",
       "4      15066      15638             12         ELEC  "
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#print top rows of dataset\n",
    "invoice_train.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 206
    },
    "id": "jUcrU9IYSMZJ",
    "outputId": "e70e5da0-7451-48ca-8c3a-643524f3139c"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>disrict</th>\n",
       "      <th>client_id</th>\n",
       "      <th>client_catg</th>\n",
       "      <th>region</th>\n",
       "      <th>creation_date</th>\n",
       "      <th>target</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>60</td>\n",
       "      <td>train_Client_0</td>\n",
       "      <td>11</td>\n",
       "      <td>101</td>\n",
       "      <td>31/12/1994</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>69</td>\n",
       "      <td>train_Client_1</td>\n",
       "      <td>11</td>\n",
       "      <td>107</td>\n",
       "      <td>29/05/2002</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>62</td>\n",
       "      <td>train_Client_10</td>\n",
       "      <td>11</td>\n",
       "      <td>301</td>\n",
       "      <td>13/03/1986</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>69</td>\n",
       "      <td>train_Client_100</td>\n",
       "      <td>11</td>\n",
       "      <td>105</td>\n",
       "      <td>11/07/1996</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>62</td>\n",
       "      <td>train_Client_1000</td>\n",
       "      <td>11</td>\n",
       "      <td>303</td>\n",
       "      <td>14/10/2014</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   disrict          client_id  client_catg  region creation_date  target\n",
       "0       60     train_Client_0           11     101    31/12/1994     0.0\n",
       "1       69     train_Client_1           11     107    29/05/2002     0.0\n",
       "2       62    train_Client_10           11     301    13/03/1986     0.0\n",
       "3       69   train_Client_100           11     105    11/07/1996     0.0\n",
       "4       62  train_Client_1000           11     303    14/10/2014     0.0"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#print top rows of dataset\n",
    "client_train.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "def extract_date_components_client(df, date_column):\n",
    "    # Convert the date string column to pandas datetime objects with specified formats\n",
    "    df[date_column] = pd.to_datetime(df[date_column], format=\"%d/%m/%Y\" )\n",
    "   \n",
    "    # Modify the 'date_column' to contain the extracted year, month, and day components\n",
    "    df['creation_year'] = df[date_column].dt.year\n",
    "    df['creation_month'] = df[date_column].dt.month\n",
    "    df['creation_day'] = df[date_column].dt.day\n",
    "\n",
    "    # Drop the original date string column if desired\n",
    "    df.drop(date_column, axis=1, inplace=True)\n",
    "\n",
    "    return df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "client_train = extract_date_components_client(client_train, 'creation_date')\n",
    "#train = extract_date_components_invoice(train, 'invoice_date')\n",
    "\n",
    "\n",
    "client_test = extract_date_components_client(client_test, 'creation_date')\n",
    "#test = extract_date_components_invoice(test, 'invoice_date')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 300
    },
    "id": "fFBTeG8YSicC",
    "outputId": "551f42dc-8b1c-42a9-9a60-f6dc9279911b"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>tarif_type</th>\n",
       "      <th>counter_number</th>\n",
       "      <th>counter_code</th>\n",
       "      <th>reading_remarque</th>\n",
       "      <th>counter_coefficient</th>\n",
       "      <th>consommation_level_1</th>\n",
       "      <th>consommation_level_2</th>\n",
       "      <th>consommation_level_3</th>\n",
       "      <th>consommation_level_4</th>\n",
       "      <th>old_index</th>\n",
       "      <th>new_index</th>\n",
       "      <th>months_number</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>4.476749e+06</td>\n",
       "      <td>4.476749e+06</td>\n",
       "      <td>4.476749e+06</td>\n",
       "      <td>4.476749e+06</td>\n",
       "      <td>4.476749e+06</td>\n",
       "      <td>4.476749e+06</td>\n",
       "      <td>4.476749e+06</td>\n",
       "      <td>4.476749e+06</td>\n",
       "      <td>4.476749e+06</td>\n",
       "      <td>4.476749e+06</td>\n",
       "      <td>4.476749e+06</td>\n",
       "      <td>4.476749e+06</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>2.012804e+01</td>\n",
       "      <td>1.230587e+11</td>\n",
       "      <td>1.724884e+02</td>\n",
       "      <td>7.321702e+00</td>\n",
       "      <td>1.003040e+00</td>\n",
       "      <td>4.109795e+02</td>\n",
       "      <td>1.093225e+02</td>\n",
       "      <td>2.030620e+01</td>\n",
       "      <td>5.292588e+01</td>\n",
       "      <td>1.776700e+04</td>\n",
       "      <td>1.834970e+04</td>\n",
       "      <td>4.483095e+01</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>1.347256e+01</td>\n",
       "      <td>1.657267e+12</td>\n",
       "      <td>1.338871e+02</td>\n",
       "      <td>1.571654e+00</td>\n",
       "      <td>3.083466e-01</td>\n",
       "      <td>7.573080e+02</td>\n",
       "      <td>1.220123e+03</td>\n",
       "      <td>1.574239e+02</td>\n",
       "      <td>8.754725e+02</td>\n",
       "      <td>4.036693e+04</td>\n",
       "      <td>4.095321e+04</td>\n",
       "      <td>3.128335e+03</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>8.000000e+00</td>\n",
       "      <td>0.000000e+00</td>\n",
       "      <td>0.000000e+00</td>\n",
       "      <td>5.000000e+00</td>\n",
       "      <td>0.000000e+00</td>\n",
       "      <td>0.000000e+00</td>\n",
       "      <td>0.000000e+00</td>\n",
       "      <td>0.000000e+00</td>\n",
       "      <td>0.000000e+00</td>\n",
       "      <td>0.000000e+00</td>\n",
       "      <td>0.000000e+00</td>\n",
       "      <td>0.000000e+00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>1.100000e+01</td>\n",
       "      <td>1.211080e+05</td>\n",
       "      <td>5.000000e+00</td>\n",
       "      <td>6.000000e+00</td>\n",
       "      <td>1.000000e+00</td>\n",
       "      <td>7.900000e+01</td>\n",
       "      <td>0.000000e+00</td>\n",
       "      <td>0.000000e+00</td>\n",
       "      <td>0.000000e+00</td>\n",
       "      <td>1.791000e+03</td>\n",
       "      <td>2.056000e+03</td>\n",
       "      <td>4.000000e+00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>1.100000e+01</td>\n",
       "      <td>4.945610e+05</td>\n",
       "      <td>2.030000e+02</td>\n",
       "      <td>8.000000e+00</td>\n",
       "      <td>1.000000e+00</td>\n",
       "      <td>2.740000e+02</td>\n",
       "      <td>0.000000e+00</td>\n",
       "      <td>0.000000e+00</td>\n",
       "      <td>0.000000e+00</td>\n",
       "      <td>7.690000e+03</td>\n",
       "      <td>8.192000e+03</td>\n",
       "      <td>4.000000e+00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>4.000000e+01</td>\n",
       "      <td>1.115161e+06</td>\n",
       "      <td>2.070000e+02</td>\n",
       "      <td>9.000000e+00</td>\n",
       "      <td>1.000000e+00</td>\n",
       "      <td>6.000000e+02</td>\n",
       "      <td>0.000000e+00</td>\n",
       "      <td>0.000000e+00</td>\n",
       "      <td>0.000000e+00</td>\n",
       "      <td>2.166000e+04</td>\n",
       "      <td>2.234300e+04</td>\n",
       "      <td>4.000000e+00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>4.500000e+01</td>\n",
       "      <td>2.798115e+13</td>\n",
       "      <td>6.000000e+02</td>\n",
       "      <td>4.130000e+02</td>\n",
       "      <td>5.000000e+01</td>\n",
       "      <td>9.999100e+05</td>\n",
       "      <td>9.990730e+05</td>\n",
       "      <td>6.449200e+04</td>\n",
       "      <td>5.479460e+05</td>\n",
       "      <td>2.800280e+06</td>\n",
       "      <td>2.870972e+06</td>\n",
       "      <td>6.366240e+05</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         tarif_type  counter_number  counter_code  reading_remarque  \\\n",
       "count  4.476749e+06    4.476749e+06  4.476749e+06      4.476749e+06   \n",
       "mean   2.012804e+01    1.230587e+11  1.724884e+02      7.321702e+00   \n",
       "std    1.347256e+01    1.657267e+12  1.338871e+02      1.571654e+00   \n",
       "min    8.000000e+00    0.000000e+00  0.000000e+00      5.000000e+00   \n",
       "25%    1.100000e+01    1.211080e+05  5.000000e+00      6.000000e+00   \n",
       "50%    1.100000e+01    4.945610e+05  2.030000e+02      8.000000e+00   \n",
       "75%    4.000000e+01    1.115161e+06  2.070000e+02      9.000000e+00   \n",
       "max    4.500000e+01    2.798115e+13  6.000000e+02      4.130000e+02   \n",
       "\n",
       "       counter_coefficient  consommation_level_1  consommation_level_2  \\\n",
       "count         4.476749e+06          4.476749e+06          4.476749e+06   \n",
       "mean          1.003040e+00          4.109795e+02          1.093225e+02   \n",
       "std           3.083466e-01          7.573080e+02          1.220123e+03   \n",
       "min           0.000000e+00          0.000000e+00          0.000000e+00   \n",
       "25%           1.000000e+00          7.900000e+01          0.000000e+00   \n",
       "50%           1.000000e+00          2.740000e+02          0.000000e+00   \n",
       "75%           1.000000e+00          6.000000e+02          0.000000e+00   \n",
       "max           5.000000e+01          9.999100e+05          9.990730e+05   \n",
       "\n",
       "       consommation_level_3  consommation_level_4     old_index     new_index  \\\n",
       "count          4.476749e+06          4.476749e+06  4.476749e+06  4.476749e+06   \n",
       "mean           2.030620e+01          5.292588e+01  1.776700e+04  1.834970e+04   \n",
       "std            1.574239e+02          8.754725e+02  4.036693e+04  4.095321e+04   \n",
       "min            0.000000e+00          0.000000e+00  0.000000e+00  0.000000e+00   \n",
       "25%            0.000000e+00          0.000000e+00  1.791000e+03  2.056000e+03   \n",
       "50%            0.000000e+00          0.000000e+00  7.690000e+03  8.192000e+03   \n",
       "75%            0.000000e+00          0.000000e+00  2.166000e+04  2.234300e+04   \n",
       "max            6.449200e+04          5.479460e+05  2.800280e+06  2.870972e+06   \n",
       "\n",
       "       months_number  \n",
       "count   4.476749e+06  \n",
       "mean    4.483095e+01  \n",
       "std     3.128335e+03  \n",
       "min     0.000000e+00  \n",
       "25%     4.000000e+00  \n",
       "50%     4.000000e+00  \n",
       "75%     4.000000e+00  \n",
       "max     6.366240e+05  "
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#Get a summary for all numerical columns\n",
    "invoice_train.describe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 300
    },
    "id": "egcBZ2ysSkji",
    "outputId": "37102972-554b-4063-ce3b-14dae74b3540"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>disrict</th>\n",
       "      <th>client_catg</th>\n",
       "      <th>region</th>\n",
       "      <th>target</th>\n",
       "      <th>creation_year</th>\n",
       "      <th>creation_month</th>\n",
       "      <th>creation_day</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>135493.000000</td>\n",
       "      <td>135493.000000</td>\n",
       "      <td>135493.000000</td>\n",
       "      <td>135493.000000</td>\n",
       "      <td>135493.000000</td>\n",
       "      <td>135493.000000</td>\n",
       "      <td>135493.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>63.511222</td>\n",
       "      <td>11.512506</td>\n",
       "      <td>206.159809</td>\n",
       "      <td>0.055841</td>\n",
       "      <td>2002.183552</td>\n",
       "      <td>7.278524</td>\n",
       "      <td>17.464031</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>3.354400</td>\n",
       "      <td>4.423761</td>\n",
       "      <td>104.207044</td>\n",
       "      <td>0.229614</td>\n",
       "      <td>11.565963</td>\n",
       "      <td>3.522944</td>\n",
       "      <td>8.701108</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>60.000000</td>\n",
       "      <td>11.000000</td>\n",
       "      <td>101.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>1977.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>1.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>62.000000</td>\n",
       "      <td>11.000000</td>\n",
       "      <td>103.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>1994.000000</td>\n",
       "      <td>4.000000</td>\n",
       "      <td>10.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>62.000000</td>\n",
       "      <td>11.000000</td>\n",
       "      <td>107.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>2005.000000</td>\n",
       "      <td>8.000000</td>\n",
       "      <td>18.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>69.000000</td>\n",
       "      <td>11.000000</td>\n",
       "      <td>307.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>2012.000000</td>\n",
       "      <td>11.000000</td>\n",
       "      <td>25.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>69.000000</td>\n",
       "      <td>51.000000</td>\n",
       "      <td>399.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>2019.000000</td>\n",
       "      <td>12.000000</td>\n",
       "      <td>31.000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             disrict    client_catg         region         target  \\\n",
       "count  135493.000000  135493.000000  135493.000000  135493.000000   \n",
       "mean       63.511222      11.512506     206.159809       0.055841   \n",
       "std         3.354400       4.423761     104.207044       0.229614   \n",
       "min        60.000000      11.000000     101.000000       0.000000   \n",
       "25%        62.000000      11.000000     103.000000       0.000000   \n",
       "50%        62.000000      11.000000     107.000000       0.000000   \n",
       "75%        69.000000      11.000000     307.000000       0.000000   \n",
       "max        69.000000      51.000000     399.000000       1.000000   \n",
       "\n",
       "       creation_year  creation_month   creation_day  \n",
       "count  135493.000000   135493.000000  135493.000000  \n",
       "mean     2002.183552        7.278524      17.464031  \n",
       "std        11.565963        3.522944       8.701108  \n",
       "min      1977.000000        1.000000       1.000000  \n",
       "25%      1994.000000        4.000000      10.000000  \n",
       "50%      2005.000000        8.000000      18.000000  \n",
       "75%      2012.000000       11.000000      25.000000  \n",
       "max      2019.000000       12.000000      31.000000  "
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#Get a summary for all numerical columns\n",
    "client_train.describe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "-8gQmkajSnXs",
    "outputId": "fef429ad-2fd5-46e3-9d3b-32def757324c"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 4476749 entries, 0 to 4476748\n",
      "Data columns (total 16 columns):\n",
      " #   Column                Dtype \n",
      "---  ------                ----- \n",
      " 0   client_id             object\n",
      " 1   invoice_date          object\n",
      " 2   tarif_type            int64 \n",
      " 3   counter_number        int64 \n",
      " 4   counter_statue        object\n",
      " 5   counter_code          int64 \n",
      " 6   reading_remarque      int64 \n",
      " 7   counter_coefficient   int64 \n",
      " 8   consommation_level_1  int64 \n",
      " 9   consommation_level_2  int64 \n",
      " 10  consommation_level_3  int64 \n",
      " 11  consommation_level_4  int64 \n",
      " 12  old_index             int64 \n",
      " 13  new_index             int64 \n",
      " 14  months_number         int64 \n",
      " 15  counter_type          object\n",
      "dtypes: int64(12), object(4)\n",
      "memory usage: 546.5+ MB\n"
     ]
    }
   ],
   "source": [
    "#Get concise information of each column in dataset\n",
    "invoice_train.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "VII8aa8CSoed",
    "outputId": "330bd35b-fd37-422b-b718-c34d354661db"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 135493 entries, 0 to 135492\n",
      "Data columns (total 8 columns):\n",
      " #   Column          Non-Null Count   Dtype  \n",
      "---  ------          --------------   -----  \n",
      " 0   disrict         135493 non-null  int64  \n",
      " 1   client_id       135493 non-null  object \n",
      " 2   client_catg     135493 non-null  int64  \n",
      " 3   region          135493 non-null  int64  \n",
      " 4   target          135493 non-null  float64\n",
      " 5   creation_year   135493 non-null  int64  \n",
      " 6   creation_month  135493 non-null  int64  \n",
      " 7   creation_day    135493 non-null  int64  \n",
      "dtypes: float64(1), int64(6), object(1)\n",
      "memory usage: 8.3+ MB\n"
     ]
    }
   ],
   "source": [
    "#Get concise information of each column in dataset\n",
    "client_train.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "iiJ3e_pFSOWg",
    "outputId": "d074cb2a-d009-471f-f124-466db367df68"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "client_id - 135493\n",
      "invoice_date - 8275\n",
      "tarif_type - 17\n",
      "counter_number - 201893\n",
      "counter_statue - 12\n",
      "counter_code - 42\n",
      "reading_remarque - 8\n",
      "counter_coefficient - 16\n",
      "consommation_level_1 - 8295\n",
      "consommation_level_2 - 12576\n",
      "consommation_level_3 - 2253\n",
      "consommation_level_4 - 12075\n",
      "old_index - 155648\n",
      "new_index - 157980\n",
      "months_number - 1370\n",
      "counter_type - 2\n"
     ]
    }
   ],
   "source": [
    "#Getting unique values on the invoice train data\n",
    "for col in invoice_train.columns:\n",
    "    print(f\"{col} - {invoice_train[col].nunique()}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "uMC16gFETUAF",
    "outputId": "0e363f7b-35c4-45c2-c763-acf80585ef7c"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "disrict - 4\n",
      "client_id - 135493\n",
      "client_catg - 3\n",
      "region - 25\n",
      "target - 2\n",
      "creation_year - 43\n",
      "creation_month - 12\n",
      "creation_day - 31\n"
     ]
    }
   ],
   "source": [
    "#Getting unique values on the invoice train data\n",
    "for col in client_train.columns:\n",
    "    print(f\"{col} - {client_train[col].nunique()}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "h10LPXsfVHhu",
    "outputId": "332ac9e1-02e3-4a3a-a8d6-1c8d706b9601"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "client_id               0\n",
       "invoice_date            0\n",
       "tarif_type              0\n",
       "counter_number          0\n",
       "counter_statue          0\n",
       "counter_code            0\n",
       "reading_remarque        0\n",
       "counter_coefficient     0\n",
       "consommation_level_1    0\n",
       "consommation_level_2    0\n",
       "consommation_level_3    0\n",
       "consommation_level_4    0\n",
       "old_index               0\n",
       "new_index               0\n",
       "months_number           0\n",
       "counter_type            0\n",
       "dtype: int64"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#check for missing values\n",
    "invoice_train.isnull().sum()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "b3XyoiIbVO-b",
    "outputId": "9200e12f-66fd-43a4-edf4-daa1e1cfdd11"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "disrict           0\n",
       "client_id         0\n",
       "client_catg       0\n",
       "region            0\n",
       "target            0\n",
       "creation_year     0\n",
       "creation_month    0\n",
       "creation_day      0\n",
       "dtype: int64"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#check for missing values\n",
    "client_train.isnull().sum()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "i3oZLTnSVSOT"
   },
   "source": [
    "No missing values in train set"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 281
    },
    "id": "JOzSAlEWTR3j",
    "outputId": "97a46548-924e-44a4-9b85-e40d01979b99"
   },
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAkIAAAGzCAYAAADDgXghAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjYuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/P9b71AAAACXBIWXMAAA9hAAAPYQGoP6dpAAAzvUlEQVR4nO3dfVzV9f3/8ScXchEKiAp4vqLSJTGYGCTSlbWYtKiNzTY1m+SYdgFOQ0stQ1cuymZT5wXzuy3abXrL+H6n34ZFMUzZlHmBWkpp9g3T5u2gfRFO0kSEz++Pbnx+HsEL3GEI78f9dju3W+f9fp3P5/U+QDz9nM/ng5dlWZYAAAAM5N3VDQAAAHQVghAAADAWQQgAABiLIAQAAIxFEAIAAMYiCAEAAGMRhAAAgLEIQgAAwFgEIQAAYCyCEGCoO++8U3feeWdXt9HjHTp0SF5eXiosLOz0fRUWFsrLy0uHDh2yx4YOHar77ruv0/ctSZs2bZKXl5c2bdr0b9kf4AkEIcDDWn8ZtfeYPXt2V7fXKS605rMfQ4cO7epW21ixYkWHQsrZ6/H19VVYWJgSExM1bdo0ffjhh13W17/Tldwb0FFe/K0xwLMKCws1adIkPffcc4qOjnabi4uLU0JCQtc0do7Wo0Ge+Nf7p59+qq1bt7qN/fSnP9WIESM0ZcoUe6x3797KyMj4l/fnSXFxcerfv/8lvw9eXl769re/rYkTJ8qyLNXX1+v9999XUVGRGhoa9NJLLyk3N9eutyxLjY2N6tWrl3x8fDqtL0lqbm5WU1OT/P395eXlJenrI0JxcXEqLi6+5O1cbm8tLS06ffq0/Pz85O3Nv7PRPfh2dQNAT/Wd73xHSUlJl1R76tSpbv3L4+qrr9bVV1/tNvboo4/q6quv1kMPPfQvb/9Ke3+uv/76Nut68cUXdf/992vGjBmKiYnRvffeK+nr4BQQENCp/TQ0NCgoKEg+Pj4dClue5u3t3elrBTztyvi/CmCQ1vMoXn/9dc2dO1f/8R//oauuukoul0u1tbWaOXOm4uPj1bt3bwUHB+s73/mO3n//fbdttHcuyNnbPvdf6qtWrdI111yjwMBAjRgxQn/96187eZVtXeraLvT+SFJRUZFiY2MVEBCguLg4rVu3Tg8//HCbj91aWlq0ePFifeMb31BAQIAiIiL0yCOP6MSJE3bN0KFDVVVVpc2bN9sfd13ueVP9+vXT66+/Ll9fX/3iF7+wx9s7R8jpdGrSpEkaNGiQ/P39NXDgQH3ve9+zv54X6qv1a79582Y9/vjjCg8P16BBg9zmzv2+kKR3331XCQkJCggIUGxsrP70pz+5zc+fP98+inS2c7d5od7O9/1XVFSkxMREBQYGqn///nrooYf0j3/8w63m4YcfVu/evfWPf/xDGRkZ6t27twYMGKCZM2equbn5Iu8+cPk4IgR0kvr6en3xxRduY/3797f/+/nnn5efn59mzpypxsZG+fn56cMPP9T69ev1wx/+UNHR0aqpqdFvfvMbjRo1Sh9++KEcDkeH+/jd736nRx55RLfccoumT5+uTz/9VN/97ncVFhamqKiof3mdl+rTTz/t0Nrae382bNigsWPHKj4+Xvn5+Tpx4oSysrL0H//xH23298gjj9gfU/7sZz9TdXW1li1bpt27d2vLli3q1auXFi9erKlTp6p379565plnJEkRERGXvcbBgwdr1KhReu+99+RyuRQcHNxu3ZgxY1RVVaWpU6dq6NChOnbsmEpLS3X48GENHTr0kvp6/PHHNWDAAOXl5amhoeGCfR08eFBjx47Vo48+qszMTL366qv64Q9/qJKSEn3729/u0Bo7+p61fg1uvvlm5efnq6amRkuWLNGWLVu0e/duhYaG2rXNzc1KS0tTcnKyfvnLX+ovf/mLFi1apGuuuUaPPfZYh/oELpkFwKNeffVVS1K7D8uyrPfee8+SZF199dXWV1995fbaU6dOWc3NzW5j1dXVlr+/v/Xcc8+12Ud1dbVbbeu233vvPcuyLOv06dNWeHi4lZCQYDU2Ntp1q1atsiRZo0aN8tzCzxEUFGRlZmbazy91bRd6f+Lj461BgwZZX375pT22adMmS5I1ZMgQe+yvf/2rJclavXq12+tLSkrajH/jG9/o0PsgycrOzj7v/LRp0yxJ1vvvv2+vUZL16quvWpZlWSdOnLAkWS+//PIF93O+vlq/9rfddpt15syZdufO/r4YMmSIJcn67//+b3usvr7eGjhwoDV8+HB7bN68eVZ7vxLa2+b5ejvf919cXJz1z3/+064rLi62JFl5eXn2WGZmpiXJ7XvBsixr+PDhVmJiYpt9AZ7CR2NAJ1m+fLlKS0vdHmfLzMxUYGCg25i/v799Hkxzc7P+7//+T71799YNN9ygXbt2dbiHnTt36tixY3r00Ufl5+dnjz/88MMKCQm5jFVdvo6u7dz35+jRo9q7d68mTpyo3r172+OjRo1SfHy822uLiooUEhKib3/72/riiy/sR2Jionr37q333nuvk1Ypu7cvv/yy3fnAwED5+flp06ZNbh/TddTkyZMv+Xwgh8Oh73//+/bz4OBgTZw4Ubt375bT6bzsHi6m9fvv8ccfdzt3KD09XTExMdqwYUOb1zz66KNuz2+//XZ9+umnndYjwEdjQCcZMWLEBU+WPveKMunr81qWLFmiFStWqLq62u3ciH79+nW4h88++0ySdN1117mN9+rVq83Jze2pra3V6dOn7eeBgYGXHaA6urZz35/WtVx77bVtaq+99lq3MHXw4EHV19crPDy83V6OHTt2WWu4FCdPnpQk9enTp915f39/vfTSS5oxY4YiIiI0cuRI3XfffZo4caIiIyMveT/tff+cz7XXXtvm/J/rr79e0tfnMHVkvx3R+jW74YYb2szFxMTob3/7m9tYQECABgwY4DbWt2/ffykwAhdDEAK6yLlHgyTphRde0LPPPquf/OQnev755xUWFiZvb29Nnz5dLS0tdl17J7VK8vhJpT/4wQ+0efNm+3lmZuZl3z/mUtfWqr3351K1tLQoPDxcq1evbnf+3F+2nrRv3z75+PhcMKhMnz5d999/v9avX6933nlHzz77rPLz87Vx40YNHz78kvbzr7w/7fl3fU9dSFde8QZzEYSAK8h//dd/6a677tLvfvc7t/G6ujq3E6379u1rj5+t9V/grYYMGSLp6yMk3/rWt+zxpqYmVVdXa9iwYRfsZ9GiRW7/Gr+ck7VbXerazqd1LZ988kmbuXPHrrnmGv3lL3/RrbfeetHAcL4AcDkOHz6szZs3KyUl5bxHhM7uccaMGZoxY4YOHjyohIQELVq0SH/84x893tcnn3wiy7Lctvnxxx9Lkn213dnfU2efwHzu91RHemv9mh04cMDt+691rHUe6EqcIwRcQXx8fGSdc4/ToqKiNpcaX3PNNZKk8vJye6y5uVmrVq1yq0tKStKAAQNUUFDg9hFXYWFhmxDVnsTERKWmptqP2NjYji7JdqlrOx+Hw6G4uDj94Q9/sD9+kqTNmzdr7969brU/+tGP1NzcrOeff77Nds6cOeO29qCgoEt6Ly6mtrZW48ePV3Nzs301VXu++uornTp1ym3smmuuUZ8+fdTY2OjxvqSvz69at26d/dzlcukPf/iDEhIS7I/F2vueamho0GuvvdZme5faW1JSksLDw1VQUOC2trffflsfffSR0tPTL3dJgMdwRAi4gtx333167rnnNGnSJN1yyy3au3evVq9e3eZ8nm984xsaOXKk5syZo9raWoWFhen111/XmTNn3Op69eqlBQsW6JFHHtG3vvUtjR07VtXV1Xr11Vcv6RwhT7rUtV3ICy+8oO9973u69dZbNWnSJJ04cULLli1TXFycWzgaNWqUHnnkEeXn52vPnj0aPXq0evXqpYMHD6qoqEhLlizRAw88IOnrsLdy5UotWLBA1157rcLDw9scvTjXxx9/rD/+8Y+yLEsul8u+s/TJkyf1yiuv6J577rnga++++2796Ec/UmxsrHx9fbVu3TrV1NRo3Lhxdt3l9HU+119/vbKysrRjxw5FRETo97//vWpqavTqq6/aNaNHj9bgwYOVlZWlJ598Uj4+Pvr973+vAQMG6PDhw27bu9TeevXqpZdeekmTJk3SqFGjNH78ePvy+aFDh+qJJ564rPUAHtW1F60BPU/r5cY7duxod771EuOioqI2c6dOnbJmzJhhDRw40AoMDLRuvfVWq6Kiwho1alSby5X/93//10pNTbX8/f2tiIgI6+mnn7ZKS0vdLl9utWLFCis6Otry9/e3kpKSrPLy8na36UntXT5/KWu70PtjWZb1+uuvWzExMZa/v78VFxdnvfnmm9aYMWOsmJiYNrWrVq2yEhMTrcDAQKtPnz5WfHy89dRTT1lHjx61a5xOp5Wenm716dPnkm4poLNuh+Dt7W2FhoZaw4cPt6ZNm2ZVVVW1qT/38vkvvvjCys7OtmJiYqygoCArJCTESk5Ott544w23152vrwt9f53v8vn09HTrnXfesb75zW9a/v7+VkxMTLvvb2VlpZWcnGz5+flZgwcPtl555ZV2t3m+3s69fL7V2rVrreHDh1v+/v5WWFiYNWHCBOvzzz93q8nMzLSCgoLa9HS+y/oBT+FvjQHo9hISEjRgwIA2tygAgIvhHCEA3UZTU1Obj/82bdqk999//7L/NAYAs3FECEC3cejQIaWmpuqhhx6Sw+HQ/v37VVBQoJCQEO3bt++y7rUEwGycLA2g2+jbt68SExP129/+VsePH1dQUJDS09P14osvEoIAXBaOCAEAAGNxjhAAADAWQQgAABiLc4QuoKWlRUePHlWfPn08ert7AADQeSzL0pdffimHwyFv7wsf8yEIXcDRo0cVFRXV1W0AAIDLcOTIEQ0aNOiCNQShC2j9o4lHjhxRcHBwF3cDAAAuhcvlUlRU1EX/+LFEELqg1o/DgoODCUIAAHQzl3JaCydLAwAAYxGEAACAsQhCAADAWAQhAABgLIIQAAAwFkEIAAAYiyAEAACMRRACAADGIggBAABjEYQAAICxCEIAAMBYBCEAAGAsghAAADAWQQgAABjLt6sbMNnQ2Ru6ugXginXoxfSubgGAATgiBAAAjEUQAgAAxiIIAQAAYxGEAACAsQhCAADAWAQhAABgLIIQAAAwFkEIAAAYiyAEAACMRRACAADGIggBAABjEYQAAICxCEIAAMBYBCEAAGAsghAAADAWQQgAABiLIAQAAIzV4SBUXl6u+++/Xw6HQ15eXlq/fr0919TUpFmzZik+Pl5BQUFyOByaOHGijh496raN2tpaTZgwQcHBwQoNDVVWVpZOnjzpVvPBBx/o9ttvV0BAgKKiorRw4cI2vRQVFSkmJkYBAQGKj4/XW2+95TZvWZby8vI0cOBABQYGKjU1VQcPHuzokgEAQA/V4SDU0NCgYcOGafny5W3mvvrqK+3atUvPPvusdu3apT/96U86cOCAvvvd77rVTZgwQVVVVSotLVVxcbHKy8s1ZcoUe97lcmn06NEaMmSIKisr9fLLL2v+/PlatWqVXbN161aNHz9eWVlZ2r17tzIyMpSRkaF9+/bZNQsXLtTSpUtVUFCgbdu2KSgoSGlpaTp16lRHlw0AAHogL8uyrMt+sZeX1q1bp4yMjPPW7NixQyNGjNBnn32mwYMH66OPPlJsbKx27NihpKQkSVJJSYnuvfdeff7553I4HFq5cqWeeeYZOZ1O+fn5SZJmz56t9evXa//+/ZKksWPHqqGhQcXFxfa+Ro4cqYSEBBUUFMiyLDkcDs2YMUMzZ86UJNXX1ysiIkKFhYUaN27cRdfncrkUEhKi+vp6BQcHX+7bdF5DZ2/w+DaBnuLQi+ld3QKAbqojv787/Ryh+vp6eXl5KTQ0VJJUUVGh0NBQOwRJUmpqqry9vbVt2za75o477rBDkCSlpaXpwIEDOnHihF2Tmprqtq+0tDRVVFRIkqqrq+V0Ot1qQkJClJycbNecq7GxUS6Xy+0BAAB6rk4NQqdOndKsWbM0fvx4O5E5nU6Fh4e71fn6+iosLExOp9OuiYiIcKtpfX6xmrPnz35dezXnys/PV0hIiP2Iiorq8JoBAED30WlBqKmpST/60Y9kWZZWrlzZWbvxqDlz5qi+vt5+HDlypKtbAgAAnci3MzbaGoI+++wzbdy40e3zucjISB07dsyt/syZM6qtrVVkZKRdU1NT41bT+vxiNWfPt44NHDjQrSYhIaHdvv39/eXv79/R5QIAgG7K40eEWkPQwYMH9Ze//EX9+vVzm09JSVFdXZ0qKyvtsY0bN6qlpUXJycl2TXl5uZqamuya0tJS3XDDDerbt69dU1ZW5rbt0tJSpaSkSJKio6MVGRnpVuNyubRt2za7BgAAmK3DQejkyZPas2eP9uzZI+nrk5L37Nmjw4cPq6mpSQ888IB27typ1atXq7m5WU6nU06nU6dPn5Yk3Xjjjbrnnns0efJkbd++XVu2bFFOTo7GjRsnh8MhSXrwwQfl5+enrKwsVVVVae3atVqyZIlyc3PtPqZNm6aSkhItWrRI+/fv1/z587Vz507l5ORI+vqKtunTp2vBggV68803tXfvXk2cOFEOh+OCV7kBAABzdPjy+U2bNumuu+5qM56Zman58+crOjq63de99957uvPOOyV9fUPFnJwc/fnPf5a3t7fGjBmjpUuXqnfv3nb9Bx98oOzsbO3YsUP9+/fX1KlTNWvWLLdtFhUVae7cuTp06JCuu+46LVy4UPfee689b1mW5s2bp1WrVqmurk633XabVqxYoeuvv/6S1srl80DX4fJ5AJerI7+//6X7CPV0BCGg6xCEAFyuK+o+QgAAAFcqghAAADAWQQgAABiLIAQAAIxFEAIAAMYiCAEAAGMRhAAAgLEIQgAAwFgEIQAAYCyCEAAAMBZBCAAAGIsgBAAAjEUQAgAAxiIIAQAAYxGEAACAsQhCAADAWAQhAABgLIIQAAAwFkEIAAAYiyAEAACMRRACAADGIggBAABjEYQAAICxCEIAAMBYBCEAAGAsghAAADAWQQgAABiLIAQAAIxFEAIAAMYiCAEAAGMRhAAAgLEIQgAAwFgEIQAAYCyCEAAAMBZBCAAAGIsgBAAAjEUQAgAAxiIIAQAAYxGEAACAsQhCAADAWAQhAABgLIIQAAAwFkEIAAAYiyAEAACM1eEgVF5ervvvv18Oh0NeXl5av36927xlWcrLy9PAgQMVGBio1NRUHTx40K2mtrZWEyZMUHBwsEJDQ5WVlaWTJ0+61XzwwQe6/fbbFRAQoKioKC1cuLBNL0VFRYqJiVFAQIDi4+P11ltvdbgXAABgrg4HoYaGBg0bNkzLly9vd37hwoVaunSpCgoKtG3bNgUFBSktLU2nTp2yayZMmKCqqiqVlpaquLhY5eXlmjJlij3vcrk0evRoDRkyRJWVlXr55Zc1f/58rVq1yq7ZunWrxo8fr6ysLO3evVsZGRnKyMjQvn37OtQLAAAwl5dlWdZlv9jLS+vWrVNGRoakr4/AOBwOzZgxQzNnzpQk1dfXKyIiQoWFhRo3bpw++ugjxcbGaseOHUpKSpIklZSU6N5779Xnn38uh8OhlStX6plnnpHT6ZSfn58kafbs2Vq/fr32798vSRo7dqwaGhpUXFxs9zNy5EglJCSooKDgknq5GJfLpZCQENXX1ys4OPhy36bzGjp7g8e3CfQUh15M7+oWAHRTHfn97dFzhKqrq+V0OpWammqPhYSEKDk5WRUVFZKkiooKhYaG2iFIklJTU+Xt7a1t27bZNXfccYcdgiQpLS1NBw4c0IkTJ+yas/fTWtO6n0vp5VyNjY1yuVxuDwAA0HN5NAg5nU5JUkREhNt4RESEPed0OhUeHu427+vrq7CwMLea9rZx9j7OV3P2/MV6OVd+fr5CQkLsR1RU1CWsGgAAdFdcNXaWOXPmqL6+3n4cOXKkq1sCAACdyKNBKDIyUpJUU1PjNl5TU2PPRUZG6tixY27zZ86cUW1trVtNe9s4ex/nqzl7/mK9nMvf31/BwcFuDwAA0HN5NAhFR0crMjJSZWVl9pjL5dK2bduUkpIiSUpJSVFdXZ0qKyvtmo0bN6qlpUXJycl2TXl5uZqamuya0tJS3XDDDerbt69dc/Z+Wmta93MpvQAAALN1OAidPHlSe/bs0Z49eyR9fVLynj17dPjwYXl5eWn69OlasGCB3nzzTe3du1cTJ06Uw+Gwryy78cYbdc8992jy5Mnavn27tmzZopycHI0bN04Oh0OS9OCDD8rPz09ZWVmqqqrS2rVrtWTJEuXm5tp9TJs2TSUlJVq0aJH279+v+fPna+fOncrJyZGkS+oFAACYzbejL9i5c6fuuusu+3lrOMnMzFRhYaGeeuopNTQ0aMqUKaqrq9Ntt92mkpISBQQE2K9ZvXq1cnJydPfdd8vb21tjxozR0qVL7fmQkBC9++67ys7OVmJiovr376+8vDy3ew3dcsstWrNmjebOnaunn35a1113ndavX6+4uDi75lJ6AQAA5vqX7iPU03EfIaDrcB8hAJery+4jBAAA0J0QhAAAgLEIQgAAwFgEIQAAYCyCEAAAMBZBCAAAGIsgBAAAjEUQAgAAxiIIAQAAYxGEAACAsQhCAADAWAQhAABgLIIQAAAwFkEIAAAYiyAEAACMRRACAADGIggBAABjEYQAAICxCEIAAMBYBCEAAGAsghAAADAWQQgAABiLIAQAAIxFEAIAAMYiCAEAAGMRhAAAgLEIQgAAwFgEIQAAYCyCEAAAMBZBCAAAGIsgBAAAjEUQAgAAxiIIAQAAYxGEAACAsQhCAADAWAQhAABgLIIQAAAwFkEIAAAYiyAEAACMRRACAADGIggBAABjEYQAAICxCEIAAMBYBCEAAGAsjweh5uZmPfvss4qOjlZgYKCuueYaPf/887Isy66xLEt5eXkaOHCgAgMDlZqaqoMHD7ptp7a2VhMmTFBwcLBCQ0OVlZWlkydPutV88MEHuv322xUQEKCoqCgtXLiwTT9FRUWKiYlRQECA4uPj9dZbb3l6yQAAoJvyeBB66aWXtHLlSi1btkwfffSRXnrpJS1cuFC//vWv7ZqFCxdq6dKlKigo0LZt2xQUFKS0tDSdOnXKrpkwYYKqqqpUWlqq4uJilZeXa8qUKfa8y+XS6NGjNWTIEFVWVurll1/W/PnztWrVKrtm69atGj9+vLKysrR7925lZGQoIyND+/bt8/SyAQBAN+RlnX2oxgPuu+8+RURE6He/+509NmbMGAUGBuqPf/yjLMuSw+HQjBkzNHPmTElSfX29IiIiVFhYqHHjxumjjz5SbGysduzYoaSkJElSSUmJ7r33Xn3++edyOBxauXKlnnnmGTmdTvn5+UmSZs+erfXr12v//v2SpLFjx6qhoUHFxcV2LyNHjlRCQoIKCgouuhaXy6WQkBDV19crODjYY+9Rq6GzN3h8m0BPcejF9K5uAUA31ZHf3x4/InTLLbeorKxMH3/8sSTp/fff19/+9jd95zvfkSRVV1fL6XQqNTXVfk1ISIiSk5NVUVEhSaqoqFBoaKgdgiQpNTVV3t7e2rZtm11zxx132CFIktLS0nTgwAGdOHHCrjl7P601rfs5V2Njo1wul9sDAAD0XL6e3uDs2bPlcrkUExMjHx8fNTc36xe/+IUmTJggSXI6nZKkiIgIt9dFRETYc06nU+Hh4e6N+voqLCzMrSY6OrrNNlrn+vbtK6fTecH9nCs/P18///nPL2fZAACgG/L4EaE33nhDq1ev1po1a7Rr1y699tpr+uUvf6nXXnvN07vyuDlz5qi+vt5+HDlypKtbAgAAncjjR4SefPJJzZ49W+PGjZMkxcfH67PPPlN+fr4yMzMVGRkpSaqpqdHAgQPt19XU1CghIUGSFBkZqWPHjrlt98yZM6qtrbVfHxkZqZqaGrea1ucXq2mdP5e/v7/8/f0vZ9kAAKAb8vgRoa+++kre3u6b9fHxUUtLiyQpOjpakZGRKisrs+ddLpe2bdumlJQUSVJKSorq6upUWVlp12zcuFEtLS1KTk62a8rLy9XU1GTXlJaW6oYbblDfvn3tmrP301rTuh8AAGA2jweh+++/X7/4xS+0YcMGHTp0SOvWrdMrr7yi73//+5IkLy8vTZ8+XQsWLNCbb76pvXv3auLEiXI4HMrIyJAk3Xjjjbrnnns0efJkbd++XVu2bFFOTo7GjRsnh8MhSXrwwQfl5+enrKwsVVVVae3atVqyZIlyc3PtXqZNm6aSkhItWrRI+/fv1/z587Vz507l5OR4etkAAKAb8vhHY7/+9a/17LPP6vHHH9exY8fkcDj0yCOPKC8vz6556qmn1NDQoClTpqiurk633XabSkpKFBAQYNesXr1aOTk5uvvuu+Xt7a0xY8Zo6dKl9nxISIjeffddZWdnKzExUf3791deXp7bvYZuueUWrVmzRnPnztXTTz+t6667TuvXr1dcXJynlw0AALohj99HqCfhPkJA1+E+QgAuV5feRwgAAKC7IAgBAABjEYQAAICxCEIAAMBYBCEAAGAsghAAADAWQQgAABiLIAQAAIxFEAIAAMYiCAEAAGMRhAAAgLEIQgAAwFgEIQAAYCyCEAAAMBZBCAAAGIsgBAAAjEUQAgAAxiIIAQAAYxGEAACAsQhCAADAWAQhAABgLIIQAAAwFkEIAAAYiyAEAACMRRACAADGIggBAABjEYQAAICxCEIAAMBYBCEAAGAsghAAADAWQQgAABiLIAQAAIxFEAIAAMYiCAEAAGMRhAAAgLEIQgAAwFgEIQAAYCyCEAAAMBZBCAAAGIsgBAAAjEUQAgAAxiIIAQAAYxGEAACAsQhCAADAWJ0ShP7xj3/ooYceUr9+/RQYGKj4+Hjt3LnTnrcsS3l5eRo4cKACAwOVmpqqgwcPum2jtrZWEyZMUHBwsEJDQ5WVlaWTJ0+61XzwwQe6/fbbFRAQoKioKC1cuLBNL0VFRYqJiVFAQIDi4+P11ltvdcaSAQBAN+TxIHTixAndeuut6tWrl95++219+OGHWrRokfr27WvXLFy4UEuXLlVBQYG2bdumoKAgpaWl6dSpU3bNhAkTVFVVpdLSUhUXF6u8vFxTpkyx510ul0aPHq0hQ4aosrJSL7/8subPn69Vq1bZNVu3btX48eOVlZWl3bt3KyMjQxkZGdq3b5+nlw0AALohL8uyLE9ucPbs2dqyZYv++te/tjtvWZYcDodmzJihmTNnSpLq6+sVERGhwsJCjRs3Th999JFiY2O1Y8cOJSUlSZJKSkp077336vPPP5fD4dDKlSv1zDPPyOl0ys/Pz973+vXrtX//fknS2LFj1dDQoOLiYnv/I0eOVEJCggoKCi66FpfLpZCQENXX1ys4OPhfel/aM3T2Bo9vE+gpDr2Y3tUtAOimOvL72+NHhN58800lJSXphz/8ocLDwzV8+HD953/+pz1fXV0tp9Op1NRUeywkJETJycmqqKiQJFVUVCg0NNQOQZKUmpoqb29vbdu2za6544477BAkSWlpaTpw4IBOnDhh15y9n9aa1v2cq7GxUS6Xy+0BAAB6Lo8HoU8//VQrV67Uddddp3feeUePPfaYfvazn+m1116TJDmdTklSRESE2+siIiLsOafTqfDwcLd5X19fhYWFudW0t42z93G+mtb5c+Xn5yskJMR+REVFdXj9AACg+/B4EGppadFNN92kF154QcOHD9eUKVM0efLkS/ooqqvNmTNH9fX19uPIkSNd3RIAAOhEHg9CAwcOVGxsrNvYjTfeqMOHD0uSIiMjJUk1NTVuNTU1NfZcZGSkjh075jZ/5swZ1dbWutW0t42z93G+mtb5c/n7+ys4ONjtAQAAei6PB6Fbb71VBw4ccBv7+OOPNWTIEElSdHS0IiMjVVZWZs+7XC5t27ZNKSkpkqSUlBTV1dWpsrLSrtm4caNaWlqUnJxs15SXl6upqcmuKS0t1Q033GBfoZaSkuK2n9aa1v0AAACzeTwIPfHEE/r73/+uF154QZ988onWrFmjVatWKTs7W5Lk5eWl6dOna8GCBXrzzTe1d+9eTZw4UQ6HQxkZGZK+PoJ0zz33aPLkydq+fbu2bNminJwcjRs3Tg6HQ5L04IMPys/PT1lZWaqqqtLatWu1ZMkS5ebm2r1MmzZNJSUlWrRokfbv36/58+dr586dysnJ8fSyAQBAN+Tr6Q3efPPNWrdunebMmaPnnntO0dHRWrx4sSZMmGDXPPXUU2poaNCUKVNUV1en2267TSUlJQoICLBrVq9erZycHN19993y9vbWmDFjtHTpUns+JCRE7777rrKzs5WYmKj+/fsrLy/P7V5Dt9xyi9asWaO5c+fq6aef1nXXXaf169crLi7O08sGAADdkMfvI9STcB8hoOtwHyEAl6tL7yMEAADQXRCEAACAsQhCAADAWAQhAABgLIIQAAAwFkEIAAAYiyAEAACMRRACAADGIggBAABjEYQAAICxCEIAAMBYBCEAAGAsghAAADAWQQgAABiLIAQAAIxFEAIAAMYiCAEAAGMRhAAAgLEIQgAAwFgEIQAAYCyCEAAAMBZBCAAAGIsgBAAAjEUQAgAAxiIIAQAAYxGEAACAsQhCAADAWAQhAABgLIIQAAAwFkEIAAAYiyAEAACMRRACAADGIggBAABjEYQAAICxCEIAAMBYBCEAAGAsghAAADAWQQgAABiLIAQAAIxFEAIAAMYiCAEAAGMRhAAAgLEIQgAAwFgEIQAAYKxOD0IvvviivLy8NH36dHvs1KlTys7OVr9+/dS7d2+NGTNGNTU1bq87fPiw0tPTddVVVyk8PFxPPvmkzpw541azadMm3XTTTfL399e1116rwsLCNvtfvny5hg4dqoCAACUnJ2v79u2dsUwAANANdWoQ2rFjh37zm9/om9/8ptv4E088oT//+c8qKirS5s2bdfToUf3gBz+w55ubm5Wenq7Tp09r69ateu2111RYWKi8vDy7prq6Wunp6brrrru0Z88eTZ8+XT/96U/1zjvv2DVr165Vbm6u5s2bp127dmnYsGFKS0vTsWPHOnPZAACgm/CyLMvqjA2fPHlSN910k1asWKEFCxYoISFBixcvVn19vQYMGKA1a9bogQcekCTt379fN954oyoqKjRy5Ei9/fbbuu+++3T06FFFRERIkgoKCjRr1iwdP35cfn5+mjVrljZs2KB9+/bZ+xw3bpzq6upUUlIiSUpOTtbNN9+sZcuWSZJaWloUFRWlqVOnavbs2Rddg8vlUkhIiOrr6xUcHOzpt0hDZ2/w+DaBnuLQi+ld3QKAbqojv7877YhQdna20tPTlZqa6jZeWVmppqYmt/GYmBgNHjxYFRUVkqSKigrFx8fbIUiS0tLS5HK5VFVVZdecu+20tDR7G6dPn1ZlZaVbjbe3t1JTU+2aczU2Nsrlcrk9AABAz+XbGRt9/fXXtWvXLu3YsaPNnNPplJ+fn0JDQ93GIyIi5HQ67ZqzQ1DrfOvchWpcLpf++c9/6sSJE2pubm63Zv/+/e32nZ+fr5///OeXvlAAANCtefyI0JEjRzRt2jStXr1aAQEBnt58p5ozZ47q6+vtx5EjR7q6JQAA0Ik8HoQqKyt17Ngx3XTTTfL19ZWvr682b96spUuXytfXVxERETp9+rTq6urcXldTU6PIyEhJUmRkZJuryFqfX6wmODhYgYGB6t+/v3x8fNqtad3Gufz9/RUcHOz2AAAAPZfHg9Ddd9+tvXv3as+ePfYjKSlJEyZMsP+7V69eKisrs19z4MABHT58WCkpKZKklJQU7d271+3qrtLSUgUHBys2NtauOXsbrTWt2/Dz81NiYqJbTUtLi8rKyuwaAABgNo+fI9SnTx/FxcW5jQUFBalfv372eFZWlnJzcxUWFqbg4GBNnTpVKSkpGjlypCRp9OjRio2N1Y9//GMtXLhQTqdTc+fOVXZ2tvz9/SVJjz76qJYtW6annnpKP/nJT7Rx40a98cYb2rDh/1+JlZubq8zMTCUlJWnEiBFavHixGhoaNGnSJE8vGwAAdEOdcrL0xfzqV7+St7e3xowZo8bGRqWlpWnFihX2vI+Pj4qLi/XYY48pJSVFQUFByszM1HPPPWfXREdHa8OGDXriiSe0ZMkSDRo0SL/97W+VlpZm14wdO1bHjx9XXl6enE6nEhISVFJS0uYEagAAYKZOu49QT8B9hICuw32EAFyuK+I+QgAAAFc6ghAAADAWQQgAABiLIAQAAIxFEAIAAMYiCAEAAGMRhAAAgLEIQgAAwFgEIQAAYCyCEAAAMBZBCAAAGIsgBAAAjEUQAgAAxiIIAQAAYxGEAACAsQhCAADAWAQhAABgLIIQAAAwFkEIAAAYiyAEAACMRRACAADGIggBAABjEYQAAICxCEIAAMBYBCEAAGAsghAAADAWQQgAABiLIAQAAIxFEAIAAMYiCAEAAGMRhAAAgLEIQgAAwFgEIQAAYCyCEAAAMBZBCAAAGIsgBAAAjEUQAgAAxiIIAQAAYxGEAACAsQhCAADAWAQhAABgLIIQAAAwFkEIAAAYiyAEAACM5fEglJ+fr5tvvll9+vRReHi4MjIydODAAbeaU6dOKTs7W/369VPv3r01ZswY1dTUuNUcPnxY6enpuuqqqxQeHq4nn3xSZ86ccavZtGmTbrrpJvn7++vaa69VYWFhm36WL1+uoUOHKiAgQMnJydq+fbunlwwAALopjwehzZs3Kzs7W3//+99VWlqqpqYmjR49Wg0NDXbNE088oT//+c8qKirS5s2bdfToUf3gBz+w55ubm5Wenq7Tp09r69ateu2111RYWKi8vDy7prq6Wunp6brrrru0Z88eTZ8+XT/96U/1zjvv2DVr165Vbm6u5s2bp127dmnYsGFKS0vTsWPHPL1sAADQDXlZlmV15g6OHz+u8PBwbd68WXfccYfq6+s1YMAArVmzRg888IAkaf/+/brxxhtVUVGhkSNH6u2339Z9992no0ePKiIiQpJUUFCgWbNm6fjx4/Lz89OsWbO0YcMG7du3z97XuHHjVFdXp5KSEklScnKybr75Zi1btkyS1NLSoqioKE2dOlWzZ8++aO8ul0shISGqr69XcHCwp98aDZ29wePbBHqKQy+md3ULALqpjvz+7vRzhOrr6yVJYWFhkqTKyko1NTUpNTXVromJidHgwYNVUVEhSaqoqFB8fLwdgiQpLS1NLpdLVVVVds3Z22itad3G6dOnVVlZ6Vbj7e2t1NRUu+ZcjY2Ncrlcbg8AANBzdWoQamlp0fTp03XrrbcqLi5OkuR0OuXn56fQ0FC32oiICDmdTrvm7BDUOt86d6Eal8ulf/7zn/riiy/U3Nzcbk3rNs6Vn5+vkJAQ+xEVFXV5CwcAAN1Cpwah7Oxs7du3T6+//npn7sZj5syZo/r6evtx5MiRrm4JAAB0It/O2nBOTo6Ki4tVXl6uQYMG2eORkZE6ffq06urq3I4K1dTUKDIy0q459+qu1qvKzq4590qzmpoaBQcHKzAwUD4+PvLx8Wm3pnUb5/L395e/v//lLRgAAHQ7Hj8iZFmWcnJytG7dOm3cuFHR0dFu84mJierVq5fKysrssQMHDujw4cNKSUmRJKWkpGjv3r1uV3eVlpYqODhYsbGxds3Z22itad2Gn5+fEhMT3WpaWlpUVlZm1wAAALN5/IhQdna21qxZo//5n/9Rnz597PNxQkJCFBgYqJCQEGVlZSk3N1dhYWEKDg7W1KlTlZKSopEjR0qSRo8erdjYWP34xz/WwoUL5XQ6NXfuXGVnZ9tHbB599FEtW7ZMTz31lH7yk59o48aNeuONN7Rhw/+/Eis3N1eZmZlKSkrSiBEjtHjxYjU0NGjSpEmeXjYAAOiGPB6EVq5cKUm688473cZfffVVPfzww5KkX/3qV/L29taYMWPU2NiotLQ0rVixwq718fFRcXGxHnvsMaWkpCgoKEiZmZl67rnn7Jro6Ght2LBBTzzxhJYsWaJBgwbpt7/9rdLS0uyasWPH6vjx48rLy5PT6VRCQoJKSkranEANAADM1On3EerOuI8Q0HW4jxCAy3VF3UcIAADgSkUQAgAAxiIIAQAAYxGEAACAsQhCAADAWAQhAABgLIIQAAAwFkEIAAAYiyAEAACMRRACAADGIggBAABjEYQAAICxCEIAAMBYBCEAAGAsghAAADAWQQgAABiLIAQAAIxFEAIAAMYiCAEAAGMRhAAAgLEIQgAAwFgEIQAAYCyCEAAAMBZBCAAAGIsgBAAAjEUQAgAAxiIIAQAAY/l2dQMA0JMNnb2hq1sArmiHXkzv0v1zRAgAABiLIAQAAIxFEAIAAMYiCAEAAGMRhAAAgLEIQgAAwFgEIQAAYCyCEAAAMBZBCAAAGIsgBAAAjEUQAgAAxiIIAQAAYxGEAACAsQhCAADAWAQhAABgLIIQAAAwlhFBaPny5Ro6dKgCAgKUnJys7du3d3VLAADgCtDjg9DatWuVm5urefPmadeuXRo2bJjS0tJ07Nixrm4NAAB0sR4fhF555RVNnjxZkyZNUmxsrAoKCnTVVVfp97//fVe3BgAAuphvVzfQmU6fPq3KykrNmTPHHvP29lZqaqoqKira1Dc2NqqxsdF+Xl9fL0lyuVyd0l9L41edsl2gJ+isn7t/N37OgQvrjJ/11m1alnXR2h4dhL744gs1NzcrIiLCbTwiIkL79+9vU5+fn6+f//znbcajoqI6rUcA7QtZ3NUdAPh36Myf9S+//FIhISEXrOnRQaij5syZo9zcXPt5S0uLamtr1a9fP3l5eXVhZ+hsLpdLUVFROnLkiIKDg7u6HQCdhJ91M1iWpS+//FIOh+OitT06CPXv318+Pj6qqalxG6+pqVFkZGSben9/f/n7+7uNhYaGdmaLuMIEBwfzP0fAAPys93wXOxLUqkefLO3n56fExESVlZXZYy0tLSorK1NKSkoXdgYAAK4EPfqIkCTl5uYqMzNTSUlJGjFihBYvXqyGhgZNmjSpq1sDAABdrMcHobFjx+r48ePKy8uT0+lUQkKCSkpK2pxADbP5+/tr3rx5bT4aBdCz8LOOc3lZl3JtGQAAQA/Uo88RAgAAuBCCEAAAMBZBCAAAGIsgBAAAjEUQAgAAxiIIAZKWL1+uoUOHKiAgQMnJydq+fXtXtwTAg8rLy3X//ffL4XDIy8tL69ev7+qWcIUgCMF4a9euVW5urubNm6ddu3Zp2LBhSktL07Fjx7q6NQAe0tDQoGHDhmn58uVd3QquMNxHCMZLTk7WzTffrGXLlkn6+s+wREVFaerUqZo9e3YXdwfA07y8vLRu3TplZGR0dSu4AnBECEY7ffq0KisrlZqaao95e3srNTVVFRUVXdgZAODfgSAEo33xxRdqbm5u8ydXIiIi5HQ6u6grAMC/C0EIAAAYiyAEo/Xv318+Pj6qqalxG6+pqVFkZGQXdQUA+HchCMFofn5+SkxMVFlZmT3W0tKisrIypaSkdGFnAIB/B9+ubgDoarm5ucrMzFRSUpJGjBihxYsXq6GhQZMmTerq1gB4yMmTJ/XJJ5/Yz6urq7Vnzx6FhYVp8ODBXdgZuhqXzwOSli1bppdffllOp1MJCQlaunSpkpOTu7otAB6yadMm3XXXXW3GMzMzVVhY+O9vCFcMghAAADAW5wgBAABjEYQAAICxCEIAAMBYBCEAAGAsghAAADAWQQgAABiLIAQAAIxFEAIAAMYiCAEAAGMRhAAAgLEIQgAAwFj/D43Yvkl9sK+NAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 640x480 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "#Visualize fraudulent activities\n",
    "fraudactivities = client_train.groupby(['target'])['client_id'].count()\n",
    "plt.bar(x=fraudactivities.index, height=fraudactivities.values, tick_label = [0,1])\n",
    "plt.title('Fraud - Target Distribution')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "fQb7RZf_Ufn9"
   },
   "source": [
    "Target is highly imbalanced with fewer cases of fraudulent activities"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 545
    },
    "id": "v8G8BPgMUrVW",
    "outputId": "886941f9-3a90-48c2-ddac-75728bd7b935"
   },
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAjkAAAGzCAYAAADNKAZOAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjYuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/P9b71AAAACXBIWXMAAA9hAAAPYQGoP6dpAAA8kUlEQVR4nO3de1hVZd7/8Q8H9wbRjXkA5AGVtFISNVFxT2aa5FYp88kaNcfQPKQDpjKPGfNz8NDM2GgHLTVznMKZyUntmazEIMLUSsrEGA8lY2Zi6UbLZCcpKKzfH8/FGneiiCdk+X5d17pyr/u77v1ddyaf1l5r62MYhiEAAACL8a3tBgAAAK4EQg4AALAkQg4AALAkQg4AALAkQg4AALAkQg4AALAkQg4AALAkQg4AALAkQg4AALAkQg5gETNnzpSPj4/XvlatWmnkyJGX7T1GjhypVq1aXbb5amLDhg3y8fHRhg0baqWfn69lenq6fHx8tHXr1qvy/r169VKvXr2uynsBVkHIAXDF/fGPf9SaNWtquw1J0ueff66ZM2fq66+/ru1WznIt9wbURf613QCAK6egoEC+vpfv/2X+/Oc/q6KiosbH/fGPf9QDDzygQYMGXbZeLrafzz//XLNmzVKvXr1qdBXocq9lVc7X27vvvntF3xuwIkIOYGF2u/2yzFNSUqKgoCDVq1fvssx3uVzpfgzD0MmTJxUYGHjZ1vJi2Wy2Wn1/oC7i4yqgDvrwww/VtWtXBQQEqHXr1nrppZeqrPv5fSSnTp3SrFmzdNNNNykgIEBNmjRRjx49lJ2dbdaMHDlSDRo00N69ezVgwAA1bNhQw4cPN8d+foWhoqJCCxYsUExMjAICAtSsWTP169fPvFfFx8dHJSUlWr58uXx8fOTj41PtfULffPONBg0apKCgIIWEhGjKlCkqLS09q66qfl577TXFxsaqYcOGcjgciomJ0YIFCyT93300Dz74oCSpd+/eZj+V9/m0atVK99xzj7KystSlSxcFBgaaa3uu+5t++uknPfroo2rSpIkcDocefvhh/fDDD141Pj4+mjlz5lnHnjlndb1VdU/O4cOHNXr0aIWGhiogIEAdO3bU8uXLvWq+/vpr+fj46Omnn9bSpUvVunVr2e12de3aVZ9++ulZPQFWwpUcoI7ZsWOH+vbtq2bNmmnmzJk6ffq0ZsyYodDQ0GqPnTlzpubMmaMxY8aoW7du8ng82rp1q7Zt26a7777brDt9+rRcLpd69Oihp59+WvXr1z/nnKNHj1Z6err69++vMWPG6PTp0/rggw/08ccfq0uXLvrb3/5mvt+4ceMkSa1btz7nfCdOnFCfPn1UWFioxx57TOHh4frb3/6m9evXV3t+2dnZGjZsmPr06aM//elPkqQvvvhCH330kSZNmqSePXvqscce0/PPP6/f/va3ateunSSZ/5T+72OpYcOG6dFHH9XYsWN1yy23nPc9k5OT1ahRI82cOVMFBQV68cUXtX//fvNG6Qt1Ib2d6cSJE+rVq5e+/PJLJScnKyoqSqtXr9bIkSN17NgxTZo0yat+xYoV+vHHH/Xoo4/Kx8dHc+fO1f3336+vvvrqmrtCB1w2BoA6ZdCgQUZAQICxf/9+c9/nn39u+Pn5GT//T7ply5ZGYmKi+bpjx45GQkLCeedPTEw0JBlPPPFElWMtW7Y0X69fv96QZDz22GNn1VZUVJi/DgoK8urjfObPn29IMlatWmXuKykpMdq0aWNIMt5///1z9jNp0iTD4XAYp0+fPuf8q1evPmueSi1btjQkGZmZmVWOnXkOr7zyiiHJiI2NNcrKysz9c+fONSQZb775prlPkjFjxoxq5zxfb3feeadx5513mq8r1+nvf/+7ua+srMxwOp1GgwYNDI/HYxiGYezbt8+QZDRp0sQ4evSoWfvmm28akoy33377rPcCrIKPq4A6pLy8XFlZWRo0aJBatGhh7m/Xrp1cLle1xzdq1Ei7du3Snj17qq2dMGFCtTX/+7//Kx8fH82YMeOssZpcxTjTunXr1Lx5cz3wwAPmvvr165tXgc6nUaNGKikp8fr4raaioqIuaC0rjRs3zutKyIQJE+Tv769169ZddA8XYt26dQoLC9OwYcPMffXq1dNjjz2m48ePa+PGjV71Q4YM0Q033GC+vuOOOyRJX3311RXtE6hNhBygDjly5IhOnDihm2666ayx6j5WkaTZs2fr2LFjuvnmmxUTE6OpU6dq+/btZ9X5+/srIiKi2vn27t2r8PBwNW7c+MJO4ALs379fbdq0OSskXcj5/frXv9bNN9+s/v37KyIiQo888ogyMzNr9P5RUVE1qv/5v4sGDRqoefPmV/wx8P379+umm24664mvyo+39u/f77X/zFAsyQw8P79/CLASQg5wHenZs6f27t2rl19+We3bt9eyZcvUuXNnLVu2zKvObrdf8celr4SQkBDl5+frrbfe0sCBA/X++++rf//+SkxMvOA5AgMDr2CH3srLy6/ae/n5+VW53zCMq9YDcLXVvT/FgOtYs2bNFBgYWOXHTQUFBRc0R+PGjTVq1Cj94x//0IEDB9ShQ4cqn/y5EK1bt9bBgwd19OjR89bV5KOrli1bau/evWf98L3Q87PZbLr33nu1ePFi7d27V48++qj++te/6ssvv6xxLxfi5/8ujh8/rkOHDnk99XXDDTfo2LFjXnVlZWU6dOiQ176artOePXvO+p6g3bt3m+PA9Y6QA9Qhfn5+crlcWrNmjQoLC839X3zxhbKysqo9/vvvv/d63aBBA7Vp06bKx7MvxODBg2UYhmbNmnXW2JkhJSgo6Kwf8ucyYMAAHTx4UK+//rq576efftLSpUurPfbn5+fr66sOHTpIknmOQUFBknTB/VRn6dKlOnXqlPn6xRdf1OnTp9W/f39zX+vWrbVp06azjvv5lZya9DZgwAC53W6tXLnS3Hf69Gm98MILatCgge68886LOR3AUniEHKhjZs2apczMTN1xxx369a9/bf5gu/XWW6u8v+ZM0dHR6tWrl2JjY9W4cWNt3bpVr7/+upKTky+ql969e2vEiBF6/vnntWfPHvXr108VFRX64IMP1Lt3b3Pe2NhYvffee3r22WcVHh6uqKgoxcXFVTnn2LFjtXDhQj388MPKy8tT8+bN9be//e28j7FXGjNmjI4ePaq77rpLERER2r9/v1544QV16tTJvFelU6dO8vPz05/+9CcVFxfLbrfrrrvuUkhIyEWtQVlZmfr06aNf/vKXKigo0OLFi9WjRw8NHDjQq6/x48dr8ODBuvvuu/Wvf/1LWVlZatq0qddcNelt3LhxeumllzRy5Ejl5eWpVatWev311/XRRx9p/vz5atiw4UWdD2AptftwF4CLsXHjRiM2Ntaw2WzGjTfeaCxZssSYMWNGtY+Q//73vze6detmNGrUyAgMDDTatm1r/OEPf/B6BDoxMdEICgqq8n1//si2YRjG6dOnjXnz5hlt27Y1bDab0axZM6N///5GXl6eWbN7926jZ8+eRmBgoCGp2sfJ9+/fbwwcONCoX7++0bRpU2PSpElGZmZmtY+Qv/7660bfvn2NkJAQw2azGS1atDAeffRR49ChQ17z//nPfzZuvPFG87H7yjlbtmx5zkfsz/UI+caNG41x48YZN9xwg9GgQQNj+PDhxvfff+91bHl5uTFt2jSjadOmRv369Q2Xy2V8+eWXZ815vt5+/gi5YRhGUVGRMWrUKKNp06aGzWYzYmJijFdeecWrpvIR8nnz5p11TjrHo+2AVfgYBnedAQAA6+GeHAAAYEmEHAAAYEmEHAAAYEmEHAAAYEmEHAAAYEmEHAAAYEnX9ZcBVlRU6ODBg2rYsOFl/6p3AABwZRiGoR9//FHh4eHn/Xv2ruuQc/DgQUVGRtZ2GwAA4CIcOHBAERER5xy/rkNO5deeHzhwQA6Ho5a7AQAAF8Lj8SgyMrLav77kug45lR9RORwOQg4AAHVMdbeacOMxAACwJEIOAACwJEIOAACwJEIOAACwJEIOAACwJEIOAACwJEIOAACwJEIOAACwJEIOAACwJEIOAACwJEIOAACwJEIOAACwJEIOAACwJEIOAACwJP9LOfipp55SamqqJk2apPnz50uSTp48qd/85jd67bXXVFpaKpfLpcWLFys0NNQ8rrCwUBMmTND777+vBg0aKDExUXPmzJG//3/a2bBhg1JSUrRr1y5FRkZq+vTpGjlypNf7L1q0SPPmzZPb7VbHjh31wgsvqFu3bpdySqijWj2RUdstnOXrpxJquwUAuK5d9JWcTz/9VC+99JI6dOjgtX/KlCl6++23tXr1am3cuFEHDx7U/fffb46Xl5crISFBZWVl2rx5s5YvX6709HSlpaWZNfv27VNCQoJ69+6t/Px8TZ48WWPGjFFWVpZZs3LlSqWkpGjGjBnatm2bOnbsKJfLpcOHD1/sKQEAAAvxMQzDqOlBx48fV+fOnbV48WL9/ve/V6dOnTR//nwVFxerWbNmWrFihR544AFJ0u7du9WuXTvl5uaqe/fueuedd3TPPffo4MGD5tWdJUuWaNq0aTpy5IhsNpumTZumjIwM7dy503zPoUOH6tixY8rMzJQkxcXFqWvXrlq4cKEkqaKiQpGRkZo4caKeeOKJKvsuLS1VaWmp+drj8SgyMlLFxcVyOBw1XQZcQ7iSAwDXD4/Ho+Dg4Gp/fl/UlZykpCQlJCQoPj7ea39eXp5OnTrltb9t27Zq0aKFcnNzJUm5ubmKiYnx+vjK5XLJ4/Fo165dZs3P53a5XOYcZWVlysvL86rx9fVVfHy8WVOVOXPmKDg42NwiIyMv5vQBAEAdUOOQ89prr2nbtm2aM2fOWWNut1s2m02NGjXy2h8aGiq3223WnBlwKscrx85X4/F4dOLECX333XcqLy+vsqZyjqqkpqaquLjY3A4cOHBhJw0AAOqcGt14fODAAU2aNEnZ2dkKCAi4Uj1dMXa7XXa7vbbbAAAAV0GNruTk5eXp8OHD6ty5s/z9/eXv76+NGzfq+eefl7+/v0JDQ1VWVqZjx455HVdUVKSwsDBJUlhYmIqKis4arxw7X43D4VBgYKCaNm0qPz+/Kmsq5wAAANe3GoWcPn36aMeOHcrPzze3Ll26aPjw4eav69Wrp5ycHPOYgoICFRYWyul0SpKcTqd27Njh9RRUdna2HA6HoqOjzZoz56isqZzDZrMpNjbWq6aiokI5OTlmDQAAuL7V6OOqhg0bqn379l77goKC1KRJE3P/6NGjlZKSosaNG8vhcGjixIlyOp3q3r27JKlv376Kjo7WiBEjNHfuXLndbk2fPl1JSUnmR0njx4/XwoUL9fjjj+uRRx7R+vXrtWrVKmVk/OcJmpSUFCUmJqpLly7q1q2b5s+fr5KSEo0aNeqSFgQAAFjDJX0ZYFWee+45+fr6avDgwV5fBljJz89Pa9eu1YQJE+R0OhUUFKTExETNnj3brImKilJGRoamTJmiBQsWKCIiQsuWLZPL5TJrhgwZoiNHjigtLU1ut1udOnVSZmbmWTcjAwCA69NFfU+OVVzoc/a49vE9OQBw/bii35MDAABwrSPkAAAASyLkAAAASyLkAAAASyLkAAAASyLkAAAASyLkAAAASyLkAAAASyLkAAAASyLkAAAASyLkAAAASyLkAAAASyLkAAAASyLkAAAASyLkAAAASyLkAAAASyLkAAAASyLkAAAASyLkAAAASyLkAAAASyLkAAAASyLkAAAASyLkAAAASyLkAAAASyLkAAAASyLkAAAASyLkAAAASyLkAAAASyLkAAAASyLkAAAASyLkAAAASyLkAAAAS6pRyHnxxRfVoUMHORwOORwOOZ1OvfPOO+Z4r1695OPj47WNHz/ea47CwkIlJCSofv36CgkJ0dSpU3X69Gmvmg0bNqhz586y2+1q06aN0tPTz+pl0aJFatWqlQICAhQXF6ctW7bU5FQAAIDF1SjkRERE6KmnnlJeXp62bt2qu+66S/fdd5927dpl1owdO1aHDh0yt7lz55pj5eXlSkhIUFlZmTZv3qzly5crPT1daWlpZs2+ffuUkJCg3r17Kz8/X5MnT9aYMWOUlZVl1qxcuVIpKSmaMWOGtm3bpo4dO8rlcunw4cOXshYAAMBCfAzDMC5lgsaNG2vevHkaPXq0evXqpU6dOmn+/PlV1r7zzju65557dPDgQYWGhkqSlixZomnTpunIkSOy2WyaNm2aMjIytHPnTvO4oUOH6tixY8rMzJQkxcXFqWvXrlq4cKEkqaKiQpGRkZo4caKeeOKJC+7d4/EoODhYxcXFcjgcF7kCuBa0eiKjtls4y9dPJdR2CwBgSRf68/ui78kpLy/Xa6+9ppKSEjmdTnP/q6++qqZNm6p9+/ZKTU3VTz/9ZI7l5uYqJibGDDiS5HK55PF4zKtBubm5io+P93ovl8ul3NxcSVJZWZny8vK8anx9fRUfH2/WnEtpaak8Ho/XBgAArMm/pgfs2LFDTqdTJ0+eVIMGDfTGG28oOjpakvTQQw+pZcuWCg8P1/bt2zVt2jQVFBTon//8pyTJ7XZ7BRxJ5mu3233eGo/HoxMnTuiHH35QeXl5lTW7d+8+b+9z5szRrFmzanrKAACgDqpxyLnllluUn5+v4uJivf7660pMTNTGjRsVHR2tcePGmXUxMTFq3ry5+vTpo71796p169aXtfGLkZqaqpSUFPO1x+NRZGRkLXYEAACulBqHHJvNpjZt2kiSYmNj9emnn2rBggV66aWXzqqNi4uTJH355Zdq3bq1wsLCznoKqqioSJIUFhZm/rNy35k1DodDgYGB8vPzk5+fX5U1lXOci91ul91ur8HZAgCAuuqSvyenoqJCpaWlVY7l5+dLkpo3by5Jcjqd2rFjh9dTUNnZ2XI4HOZHXk6nUzk5OV7zZGdnm/f92Gw2xcbGetVUVFQoJyfH694gAABwfavRlZzU1FT1799fLVq00I8//qgVK1Zow4YNysrK0t69e7VixQoNGDBATZo00fbt2zVlyhT17NlTHTp0kCT17dtX0dHRGjFihObOnSu3263p06crKSnJvMIyfvx4LVy4UI8//rgeeeQRrV+/XqtWrVJGxn+enklJSVFiYqK6dOmibt26af78+SopKdGoUaMu49IAAIC6rEYh5/Dhw3r44Yd16NAhBQcHq0OHDsrKytLdd9+tAwcO6L333jMDR2RkpAYPHqzp06ebx/v5+Wnt2rWaMGGCnE6ngoKClJiYqNmzZ5s1UVFRysjI0JQpU7RgwQJFRERo2bJlcrlcZs2QIUN05MgRpaWlye12q1OnTsrMzDzrZmQAAHD9uuTvyanL+J4c6+B7cgDg+nGhP79rfOMxAAC49vA/e2fjL+gEAACWRMgBAACWRMgBAACWRMgBAACWRMgBAACWRMgBAACWRMgBAACWRMgBAACWRMgBAACWRMgBAACWRMgBAACWRMgBAACWRMgBAACWRMgBAACWRMgBAACWRMgBAACWRMgBAACWRMgBAACWRMgBAACW5F/bDVhVqycyaruFs3z9VEJttwAAwFXDlRwAAGBJhBwAAGBJhBwAAGBJhBwAAGBJhBwAAGBJhBwAAGBJhBwAAGBJhBwAAGBJhBwAAGBJhBwAAGBJhBwAAGBJhBwAAGBJNQo5L774ojp06CCHwyGHwyGn06l33nnHHD958qSSkpLUpEkTNWjQQIMHD1ZRUZHXHIWFhUpISFD9+vUVEhKiqVOn6vTp0141GzZsUOfOnWW329WmTRulp6ef1cuiRYvUqlUrBQQEKC4uTlu2bKnJqQAAAIurUciJiIjQU089pby8PG3dulV33XWX7rvvPu3atUuSNGXKFL399ttavXq1Nm7cqIMHD+r+++83jy8vL1dCQoLKysq0efNmLV++XOnp6UpLSzNr9u3bp4SEBPXu3Vv5+fmaPHmyxowZo6ysLLNm5cqVSklJ0YwZM7Rt2zZ17NhRLpdLhw8fvtT1AAAAFuFjGIZxKRM0btxY8+bN0wMPPKBmzZppxYoVeuCBByRJu3fvVrt27ZSbm6vu3bvrnXfe0T333KODBw8qNDRUkrRkyRJNmzZNR44ckc1m07Rp05SRkaGdO3ea7zF06FAdO3ZMmZmZkqS4uDh17dpVCxculCRVVFQoMjJSEydO1BNPPHHBvXs8HgUHB6u4uFgOh+NSluEsrZ7IuKzzXQ5fP5VQ2y1cMaw3gOvd9fTn4IX+/L7oe3LKy8v12muvqaSkRE6nU3l5eTp16pTi4+PNmrZt26pFixbKzc2VJOXm5iomJsYMOJLkcrnk8XjMq0G5ublec1TWVM5RVlamvLw8rxpfX1/Fx8ebNedSWloqj8fjtQEAAGuqccjZsWOHGjRoILvdrvHjx+uNN95QdHS03G63bDabGjVq5FUfGhoqt9stSXK73V4Bp3K8cux8NR6PRydOnNB3332n8vLyKmsq5ziXOXPmKDg42NwiIyNrevoAAKCOqHHIueWWW5Sfn69PPvlEEyZMUGJioj7//PMr0dtll5qaquLiYnM7cOBAbbcEAACuEP+aHmCz2dSmTRtJUmxsrD799FMtWLBAQ4YMUVlZmY4dO+Z1NaeoqEhhYWGSpLCwsLOegqp8+urMmp8/kVVUVCSHw6HAwED5+fnJz8+vyprKOc7FbrfLbrfX9JQBAEAddMnfk1NRUaHS0lLFxsaqXr16ysnJMccKCgpUWFgop9MpSXI6ndqxY4fXU1DZ2dlyOByKjo42a86co7Kmcg6bzabY2FivmoqKCuXk5Jg1AAAANbqSk5qaqv79+6tFixb68ccftWLFCm3YsEFZWVkKDg7W6NGjlZKSosaNG8vhcGjixIlyOp3q3r27JKlv376Kjo7WiBEjNHfuXLndbk2fPl1JSUnmFZbx48dr4cKFevzxx/XII49o/fr1WrVqlTIy/nPXeEpKihITE9WlSxd169ZN8+fPV0lJiUaNGnUZlwYAANRlNQo5hw8f1sMPP6xDhw4pODhYHTp0UFZWlu6++25J0nPPPSdfX18NHjxYpaWlcrlcWrx4sXm8n5+f1q5dqwkTJsjpdCooKEiJiYmaPXu2WRMVFaWMjAxNmTJFCxYsUEREhJYtWyaXy2XWDBkyREeOHFFaWprcbrc6deqkzMzMs25GBgAA169L/p6cuozvybEO1hvA9e56+nPwin9PDgAAwLWsxk9XAbh8rqf/8wKAq40rOQAAwJIIOQAAwJIIOQAAwJIIOQAAwJIIOQAAwJIIOQAAwJIIOQAAwJIIOQAAwJIIOQAAwJIIOQAAwJIIOQAAwJIIOQAAwJIIOQAAwJIIOQAAwJIIOQAAwJIIOQAAwJIIOQAAwJIIOQAAwJIIOQAAwJIIOQAAwJIIOQAAwJIIOQAAwJIIOQAAwJIIOQAAwJIIOQAAwJIIOQAAwJIIOQAAwJIIOQAAwJIIOQAAwJIIOQAAwJJqFHLmzJmjrl27qmHDhgoJCdGgQYNUUFDgVdOrVy/5+Ph4bePHj/eqKSwsVEJCgurXr6+QkBBNnTpVp0+f9qrZsGGDOnfuLLvdrjZt2ig9Pf2sfhYtWqRWrVopICBAcXFx2rJlS01OBwAAWFiNQs7GjRuVlJSkjz/+WNnZ2Tp16pT69u2rkpISr7qxY8fq0KFD5jZ37lxzrLy8XAkJCSorK9PmzZu1fPlypaenKy0tzazZt2+fEhIS1Lt3b+Xn52vy5MkaM2aMsrKyzJqVK1cqJSVFM2bM0LZt29SxY0e5XC4dPnz4YtcCAABYiH9NijMzM71ep6enKyQkRHl5eerZs6e5v379+goLC6tyjnfffVeff/653nvvPYWGhqpTp0568sknNW3aNM2cOVM2m01LlixRVFSUnnnmGUlSu3bt9OGHH+q5556Ty+WSJD377LMaO3asRo0aJUlasmSJMjIy9PLLL+uJJ56oyWkBAAALuqR7coqLiyVJjRs39tr/6quvqmnTpmrfvr1SU1P1008/mWO5ubmKiYlRaGiouc/lcsnj8WjXrl1mTXx8vNecLpdLubm5kqSysjLl5eV51fj6+io+Pt6sqUppaak8Ho/XBgAArKlGV3LOVFFRocmTJ+v2229X+/btzf0PPfSQWrZsqfDwcG3fvl3Tpk1TQUGB/vnPf0qS3G63V8CRZL52u93nrfF4PDpx4oR++OEHlZeXV1mze/fuc/Y8Z84czZo162JPGQAA1CEXHXKSkpK0c+dOffjhh177x40bZ/46JiZGzZs3V58+fbR37161bt364ju9DFJTU5WSkmK+9ng8ioyMrMWOAADAlXJRISc5OVlr167Vpk2bFBERcd7auLg4SdKXX36p1q1bKyws7KynoIqKiiTJvI8nLCzM3HdmjcPhUGBgoPz8/OTn51dlzbnuBZIku90uu91+YScJAADqtBrdk2MYhpKTk/XGG29o/fr1ioqKqvaY/Px8SVLz5s0lSU6nUzt27PB6Cio7O1sOh0PR0dFmTU5Ojtc82dnZcjqdkiSbzabY2FivmoqKCuXk5Jg1AADg+lajKzlJSUlasWKF3nzzTTVs2NC8hyY4OFiBgYHau3evVqxYoQEDBqhJkybavn27pkyZop49e6pDhw6SpL59+yo6OlojRozQ3Llz5Xa7NX36dCUlJZlXWcaPH6+FCxfq8ccf1yOPPKL169dr1apVysjIMHtJSUlRYmKiunTpom7dumn+/PkqKSkxn7YCAADXtxqFnBdffFHS/33h35leeeUVjRw5UjabTe+9954ZOCIjIzV48GBNnz7drPXz89PatWs1YcIEOZ1OBQUFKTExUbNnzzZroqKilJGRoSlTpmjBggWKiIjQsmXLzMfHJWnIkCE6cuSI0tLS5Ha71alTJ2VmZp51MzIAALg+1SjkGIZx3vHIyEht3Lix2nlatmypdevWnbemV69e+uyzz85bk5ycrOTk5GrfDwAAXH/4u6sAAIAlEXIAAIAlEXIAAIAlEXIAAIAlEXIAAIAlEXIAAIAlEXIAAIAlEXIAAIAlEXIAAIAlEXIAAIAlEXIAAIAlEXIAAIAlEXIAAIAlEXIAAIAlEXIAAIAlEXIAAIAlEXIAAIAlEXIAAIAlEXIAAIAlEXIAAIAlEXIAAIAlEXIAAIAlEXIAAIAlEXIAAIAlEXIAAIAlEXIAAIAlEXIAAIAlEXIAAIAlEXIAAIAlEXIAAIAlEXIAAIAlEXIAAIAlEXIAAIAl1SjkzJkzR127dlXDhg0VEhKiQYMGqaCgwKvm5MmTSkpKUpMmTdSgQQMNHjxYRUVFXjWFhYVKSEhQ/fr1FRISoqlTp+r06dNeNRs2bFDnzp1lt9vVpk0bpaenn9XPokWL1KpVKwUEBCguLk5btmypyekAAAALq1HI2bhxo5KSkvTxxx8rOztbp06dUt++fVVSUmLWTJkyRW+//bZWr16tjRs36uDBg7r//vvN8fLyciUkJKisrEybN2/W8uXLlZ6errS0NLNm3759SkhIUO/evZWfn6/JkydrzJgxysrKMmtWrlyplJQUzZgxQ9u2bVPHjh3lcrl0+PDhS1kPAABgET6GYRgXe/CRI0cUEhKijRs3qmfPniouLlazZs20YsUKPfDAA5Kk3bt3q127dsrNzVX37t31zjvv6J577tHBgwcVGhoqSVqyZImmTZumI0eOyGazadq0acrIyNDOnTvN9xo6dKiOHTumzMxMSVJcXJy6du2qhQsXSpIqKioUGRmpiRMn6oknnrig/j0ej4KDg1VcXCyHw3Gxy1ClVk9kXNb5Loevn0qo7RaumLq63nW1bwDXnuvpz5ML/fl9SffkFBcXS5IaN24sScrLy9OpU6cUHx9v1rRt21YtWrRQbm6uJCk3N1cxMTFmwJEkl8slj8ejXbt2mTVnzlFZUzlHWVmZ8vLyvGp8fX0VHx9v1lSltLRUHo/HawMAANZ00SGnoqJCkydP1u2336727dtLktxut2w2mxo1auRVGxoaKrfbbdacGXAqxyvHzlfj8Xh04sQJfffddyovL6+ypnKOqsyZM0fBwcHmFhkZWfMTBwAAdcJFh5ykpCTt3LlTr7322uXs54pKTU1VcXGxuR04cKC2WwIAAFeI/8UclJycrLVr12rTpk2KiIgw94eFhamsrEzHjh3zuppTVFSksLAws+bnT0FVPn11Zs3Pn8gqKiqSw+FQYGCg/Pz85OfnV2VN5RxVsdvtstvtNT9hAABQ59ToSo5hGEpOTtYbb7yh9evXKyoqyms8NjZW9erVU05OjrmvoKBAhYWFcjqdkiSn06kdO3Z4PQWVnZ0th8Oh6Ohos+bMOSprKuew2WyKjY31qqmoqFBOTo5ZAwAArm81upKTlJSkFStW6M0331TDhg3N+1+Cg4MVGBio4OBgjR49WikpKWrcuLEcDocmTpwop9Op7t27S5L69u2r6OhojRgxQnPnzpXb7db06dOVlJRkXmUZP368Fi5cqMcff1yPPPKI1q9fr1WrVikj4z93jqekpCgxMVFdunRRt27dNH/+fJWUlGjUqFGXa20AAEAdVqOQ8+KLL0qSevXq5bX/lVde0ciRIyVJzz33nHx9fTV48GCVlpbK5XJp8eLFZq2fn5/Wrl2rCRMmyOl0KigoSImJiZo9e7ZZExUVpYyMDE2ZMkULFixQRESEli1bJpfLZdYMGTJER44cUVpamtxutzp16qTMzMyzbkYGAADXpxqFnAv5Sp2AgAAtWrRIixYtOmdNy5YttW7duvPO06tXL3322WfnrUlOTlZycnK1PQEAgOsPf3cVAACwJEIOAACwJEIOAACwJEIOAACwJEIOAACwJEIOAACwJEIOAACwJEIOAACwJEIOAACwJEIOAACwJEIOAACwJEIOAACwJEIOAACwJEIOAACwJEIOAACwJEIOAACwJEIOAACwJEIOAACwJEIOAACwJEIOAACwJEIOAACwJEIOAACwJEIOAACwJEIOAACwJEIOAACwJEIOAACwJEIOAACwJEIOAACwJEIOAACwJEIOAACwJEIOAACwJEIOAACwJEIOAACwpBqHnE2bNunee+9VeHi4fHx8tGbNGq/xkSNHysfHx2vr16+fV83Ro0c1fPhwORwONWrUSKNHj9bx48e9arZv36477rhDAQEBioyM1Ny5c8/qZfXq1Wrbtq0CAgIUExOjdevW1fR0AACARdU45JSUlKhjx45atGjROWv69eunQ4cOmds//vEPr/Hhw4dr165dys7O1tq1a7Vp0yaNGzfOHPd4POrbt69atmypvLw8zZs3TzNnztTSpUvNms2bN2vYsGEaPXq0PvvsMw0aNEiDBg3Szp07a3pKAADAgvxrekD//v3Vv3//89bY7XaFhYVVOfbFF18oMzNTn376qbp06SJJeuGFFzRgwAA9/fTTCg8P16uvvqqysjK9/PLLstlsuvXWW5Wfn69nn33WDEMLFixQv379NHXqVEnSk08+qezsbC1cuFBLliyp8r1LS0tVWlpqvvZ4PDU9fQAAUEdckXtyNmzYoJCQEN1yyy2aMGGCvv/+e3MsNzdXjRo1MgOOJMXHx8vX11effPKJWdOzZ0/ZbDazxuVyqaCgQD/88INZEx8f7/W+LpdLubm55+xrzpw5Cg4ONrfIyMjLcr4AAODac9lDTr9+/fTXv/5VOTk5+tOf/qSNGzeqf//+Ki8vlyS53W6FhIR4HePv76/GjRvL7XabNaGhoV41la+rq6kcr0pqaqqKi4vN7cCBA5d2sgAA4JpV44+rqjN06FDz1zExMerQoYNat26tDRs2qE+fPpf77WrEbrfLbrfXag8AAODquOKPkN94441q2rSpvvzyS0lSWFiYDh8+7FVz+vRpHT161LyPJywsTEVFRV41la+rqznXvUAAAOD6csVDzjfffKPvv/9ezZs3lyQ5nU4dO3ZMeXl5Zs369etVUVGhuLg4s2bTpk06deqUWZOdna1bbrlFN9xwg1mTk5Pj9V7Z2dlyOp1X+pQAAEAdUOOQc/z4ceXn5ys/P1+StG/fPuXn56uwsFDHjx/X1KlT9fHHH+vrr79WTk6O7rvvPrVp00Yul0uS1K5dO/Xr109jx47Vli1b9NFHHyk5OVlDhw5VeHi4JOmhhx6SzWbT6NGjtWvXLq1cuVILFixQSkqK2cekSZOUmZmpZ555Rrt379bMmTO1detWJScnX4ZlAQAAdV2NQ87WrVt122236bbbbpMkpaSk6LbbblNaWpr8/Py0fft2DRw4UDfffLNGjx6t2NhYffDBB173wrz66qtq27at+vTpowEDBqhHjx5e34ETHBysd999V/v27VNsbKx+85vfKC0tzeu7dH7xi19oxYoVWrp0qTp27KjXX39da9asUfv27S9lPQAAgEXU+MbjXr16yTCMc45nZWVVO0fjxo21YsWK89Z06NBBH3zwwXlrHnzwQT344IPVvh8AALj+8HdXAQAASyLkAAAASyLkAAAASyLkAAAASyLkAAAASyLkAAAASyLkAAAASyLkAAAASyLkAAAASyLkAAAASyLkAAAASyLkAAAASyLkAAAASyLkAAAASyLkAAAASyLkAAAASyLkAAAASyLkAAAASyLkAAAASyLkAAAASyLkAAAASyLkAAAASyLkAAAASyLkAAAASyLkAAAASyLkAAAASyLkAAAASyLkAAAASyLkAAAASyLkAAAASyLkAAAASyLkAAAAS6pxyNm0aZPuvfdehYeHy8fHR2vWrPEaNwxDaWlpat68uQIDAxUfH689e/Z41Rw9elTDhw+Xw+FQo0aNNHr0aB0/ftyrZvv27brjjjsUEBCgyMhIzZ0796xeVq9erbZt2yogIEAxMTFat25dTU8HAABYVI1DTklJiTp27KhFixZVOT537lw9//zzWrJkiT755BMFBQXJ5XLp5MmTZs3w4cO1a9cuZWdna+3atdq0aZPGjRtnjns8HvXt21ctW7ZUXl6e5s2bp5kzZ2rp0qVmzebNmzVs2DCNHj1an332mQYNGqRBgwZp586dNT0lAABgQf41PaB///7q379/lWOGYWj+/PmaPn267rvvPknSX//6V4WGhmrNmjUaOnSovvjiC2VmZurTTz9Vly5dJEkvvPCCBgwYoKefflrh4eF69dVXVVZWppdfflk2m0233nqr8vPz9eyzz5phaMGCBerXr5+mTp0qSXryySeVnZ2thQsXasmSJRe1GAAAwDou6z05+/btk9vtVnx8vLkvODhYcXFxys3NlSTl5uaqUaNGZsCRpPj4ePn6+uqTTz4xa3r27CmbzWbWuFwuFRQU6IcffjBrznyfyprK96lKaWmpPB6P1wYAAKzpsoYct9stSQoNDfXaHxoaao653W6FhIR4jfv7+6tx48ZeNVXNceZ7nKumcrwqc+bMUXBwsLlFRkbW9BQBAEAdcV09XZWamqri4mJzO3DgQG23BAAArpDLGnLCwsIkSUVFRV77i4qKzLGwsDAdPnzYa/z06dM6evSoV01Vc5z5HueqqRyvit1ul8Ph8NoAAIA1XdaQExUVpbCwMOXk5Jj7PB6PPvnkEzmdTkmS0+nUsWPHlJeXZ9asX79eFRUViouLM2s2bdqkU6dOmTXZ2dm65ZZbdMMNN5g1Z75PZU3l+wAAgOtbjUPO8ePHlZ+fr/z8fEn/d7Nxfn6+CgsL5ePjo8mTJ+v3v/+93nrrLe3YsUMPP/ywwsPDNWjQIElSu3bt1K9fP40dO1ZbtmzRRx99pOTkZA0dOlTh4eGSpIceekg2m02jR4/Wrl27tHLlSi1YsEApKSlmH5MmTVJmZqaeeeYZ7d69WzNnztTWrVuVnJx86asCAADqvBo/Qr5161b17t3bfF0ZPBITE5Wenq7HH39cJSUlGjdunI4dO6YePXooMzNTAQEB5jGvvvqqkpOT1adPH/n6+mrw4MF6/vnnzfHg4GC9++67SkpKUmxsrJo2baq0tDSv79L5xS9+oRUrVmj69On67W9/q5tuuklr1qxR+/btL2ohAACAtdQ45PTq1UuGYZxz3MfHR7Nnz9bs2bPPWdO4cWOtWLHivO/ToUMHffDBB+etefDBB/Xggw+ev2EAAHBduq6ergIAANcPQg4AALAkQg4AALAkQg4AALAkQg4AALAkQg4AALAkQg4AALAkQg4AALAkQg4AALAkQg4AALAkQg4AALAkQg4AALAkQg4AALAkQg4AALAkQg4AALAkQg4AALAkQg4AALAkQg4AALAkQg4AALAkQg4AALAkQg4AALAkQg4AALAkQg4AALAkQg4AALAkQg4AALAkQg4AALAkQg4AALAkQg4AALAkQg4AALAkQg4AALAkQg4AALAkQg4AALAkQg4AALCkyx5yZs6cKR8fH6+tbdu25vjJkyeVlJSkJk2aqEGDBho8eLCKioq85igsLFRCQoLq16+vkJAQTZ06VadPn/aq2bBhgzp37iy73a42bdooPT39cp8KAACow67IlZxbb71Vhw4dMrcPP/zQHJsyZYrefvttrV69Whs3btTBgwd1//33m+Pl5eVKSEhQWVmZNm/erOXLlys9PV1paWlmzb59+5SQkKDevXsrPz9fkydP1pgxY5SVlXUlTgcAANRB/ldkUn9/hYWFnbW/uLhYf/nLX7RixQrdddddkqRXXnlF7dq108cff6zu3bvr3Xff1eeff6733ntPoaGh6tSpk5588klNmzZNM2fOlM1m05IlSxQVFaVnnnlGktSuXTt9+OGHeu655+Ryua7EKQEAgDrmilzJ2bNnj8LDw3XjjTdq+PDhKiwslCTl5eXp1KlTio+PN2vbtm2rFi1aKDc3V5KUm5urmJgYhYaGmjUul0sej0e7du0ya86co7Kmco5zKS0tlcfj8doAAIA1XfaQExcXp/T0dGVmZurFF1/Uvn37dMcdd+jHH3+U2+2WzWZTo0aNvI4JDQ2V2+2WJLndbq+AUzleOXa+Go/HoxMnTpyztzlz5ig4ONjcIiMjL/V0AQDANeqyf1zVv39/89cdOnRQXFycWrZsqVWrVikwMPByv12NpKamKiUlxXzt8XgIOgAAWNQVf4S8UaNGuvnmm/Xll18qLCxMZWVlOnbsmFdNUVGReQ9PWFjYWU9bVb6ursbhcJw3SNntdjkcDq8NAABY0xUPOcePH9fevXvVvHlzxcbGql69esrJyTHHCwoKVFhYKKfTKUlyOp3asWOHDh8+bNZkZ2fL4XAoOjrarDlzjsqayjkAAAAue8j5n//5H23cuFFff/21Nm/erP/+7/+Wn5+fhg0bpuDgYI0ePVopKSl6//33lZeXp1GjRsnpdKp79+6SpL59+yo6OlojRozQv/71L2VlZWn69OlKSkqS3W6XJI0fP15fffWVHn/8ce3evVuLFy/WqlWrNGXKlMt9OgAAoI667PfkfPPNNxo2bJi+//57NWvWTD169NDHH3+sZs2aSZKee+45+fr6avDgwSotLZXL5dLixYvN4/38/LR27VpNmDBBTqdTQUFBSkxM1OzZs82aqKgoZWRkaMqUKVqwYIEiIiK0bNkyHh8HAACmyx5yXnvttfOOBwQEaNGiRVq0aNE5a1q2bKl169add55evXrps88+u6geAQCA9fF3VwEAAEsi5AAAAEsi5AAAAEsi5AAAAEsi5AAAAEsi5AAAAEsi5AAAAEsi5AAAAEsi5AAAAEsi5AAAAEsi5AAAAEsi5AAAAEsi5AAAAEsi5AAAAEsi5AAAAEsi5AAAAEsi5AAAAEsi5AAAAEsi5AAAAEsi5AAAAEsi5AAAAEsi5AAAAEsi5AAAAEsi5AAAAEsi5AAAAEsi5AAAAEsi5AAAAEsi5AAAAEsi5AAAAEsi5AAAAEsi5AAAAEsi5AAAAEuq8yFn0aJFatWqlQICAhQXF6ctW7bUdksAAOAaUKdDzsqVK5WSkqIZM2Zo27Zt6tixo1wulw4fPlzbrQEAgFpWp0POs88+q7Fjx2rUqFGKjo7WkiVLVL9+fb388su13RoAAKhl/rXdwMUqKytTXl6eUlNTzX2+vr6Kj49Xbm5ulceUlpaqtLTUfF1cXCxJ8ng8l72/itKfLvucl+pKnOe1oq6ud13tG8C153r686RyXsMwzltXZ0POd999p/LycoWGhnrtDw0N1e7du6s8Zs6cOZo1a9ZZ+yMjI69Ij9ea4Pm13cH1pa6ud13tG8C150r/efLjjz8qODj4nON1NuRcjNTUVKWkpJivKyoqdPToUTVp0kQ+Pj612Nm5eTweRUZG6sCBA3I4HLXdjuWx3lcX6311sd5XF+t95RiGoR9//FHh4eHnrauzIadp06by8/NTUVGR1/6ioiKFhYVVeYzdbpfdbvfa16hRoyvV4mXlcDj4j+QqYr2vLtb76mK9ry7W+8o43xWcSnX2xmObzabY2Fjl5OSY+yoqKpSTkyOn01mLnQEAgGtBnb2SI0kpKSlKTExUly5d1K1bN82fP18lJSUaNWpUbbcGAABqWZ0OOUOGDNGRI0eUlpYmt9utTp06KTMz86ybkesyu92uGTNmnPUxG64M1vvqYr2vLtb76mK9a5+PUd3zVwAAAHVQnb0nBwAA4HwIOQAAwJIIOQAAwJIIOQAAwJIIOQAAwJIIOdeIb7/9Vr/61a/UpEkTBQYGKiYmRlu3bjXHDcNQWlqamjdvrsDAQMXHx2vPnj212HHddr71PnXqlKZNm6aYmBgFBQUpPDxcDz/8sA4ePFjLXddd1f3+PtP48ePl4+Oj+fPnX90mLeRC1vuLL77QwIEDFRwcrKCgIHXt2lWFhYW11HHdVt16Hz9+XMnJyYqIiFBgYKCio6O1ZMmSWuz4+lGnvyfHKn744Qfdfvvt6t27t9555x01a9ZMe/bs0Q033GDWzJ07V88//7yWL1+uqKgo/e53v5PL5dLnn3+ugICAWuy+7qluvX/66Sdt27ZNv/vd79SxY0f98MMPmjRpkgYOHHjOH8w4twv5/V3pjTfe0Mcff1zt30eDc7uQ9d67d6969Oih0aNHa9asWXI4HNq1axd/llyEC1nvlJQUrV+/Xn//+9/VqlUrvfvuu/r1r3+t8PBwDRw4sBa7vw4YqHXTpk0zevTocc7xiooKIywszJg3b56579ixY4bdbjf+8Y9/XI0WLaW69a7Kli1bDEnG/v37r1BX1nWh6/3NN98Y//Vf/2Xs3LnTaNmypfHcc89d+eYs6ELWe8iQIcavfvWrq9SRtV3Iet96663G7NmzvfZ17tzZ+H//7/9dydZgGAYfV10D3nrrLXXp0kUPPvigQkJCdNttt+nPf/6zOb5v3z653W7Fx8eb+4KDgxUXF6fc3NzaaLlOq269q1JcXCwfH5868xe6XksuZL0rKio0YsQITZ06VbfeemstdWoN1a13RUWFMjIydPPNN8vlcikkJERxcXFas2ZN7TVdh13I7+9f/OIXeuutt/Ttt9/KMAy9//77+ve//62+ffvWUtfXkdpOWTAMu91u2O12IzU11di2bZvx0ksvGQEBAUZ6erphGIbx0UcfGZKMgwcPeh334IMPGr/85S9ro+U6rbr1/rkTJ04YnTt3Nh566KGr3Kk1XMh6//GPfzTuvvtuo6KiwjAMgys5l6C69T506JAhyahfv77x7LPPGp999pkxZ84cw8fHx9iwYUMtd1/3XMjv75MnTxoPP/ywIcnw9/c3bDabsXz58lrs+vpByLkG1KtXz3A6nV77Jk6caHTv3t0wDELO5Vbdep+prKzMuPfee43bbrvNKC4uvlotWkp1671161YjNDTU+Pbbb81xQs7Fq269v/32W0OSMWzYMK+ae++91xg6dOhV69MqLuTPk3nz5hk333yz8dZbbxn/+te/jBdeeMFo0KCBkZ2dfbXbve7wcdU1oHnz5oqOjvba165dO/NJh7CwMElSUVGRV01RUZE5hgtX3XpXOnXqlH75y19q//79ys7OlsPhuJptWkZ16/3BBx/o8OHDatGihfz9/eXv76/9+/frN7/5jVq1alULHddt1a1306ZN5e/vf0H/DaB61a33iRMn9Nvf/lbPPvus7r33XnXo0EHJyckaMmSInn766dpo+bpCyLkG3H777SooKPDa9+9//1stW7aUJEVFRSksLEw5OTnmuMfj0SeffCKn03lVe7WC6tZb+k/A2bNnj9577z01adLkardpGdWt94gRI7R9+3bl5+ebW3h4uKZOnaqsrKzaaLlOq269bTabunbtWu1/A7gw1a33qVOndOrUKfn6ev+49fPzU0VFxVXr87pV25eS8H9P7vj7+xt/+MMfjD179hivvvqqUb9+fePvf/+7WfPUU08ZjRo1Mt58801j+/btxn333WdERUUZJ06cqMXO66bq1rusrMwYOHCgERERYeTn5xuHDh0yt9LS0lruvu65kN/fP8fHVRfvQtb7n//8p1GvXj1j6dKlxp49e4wXXnjB8PPzMz744INa7LxuupD1vvPOO41bb73VeP/9942vvvrKeOWVV4yAgABj8eLFtdj59YGQc414++23jfbt2xt2u91o27atsXTpUq/xiooK43e/+50RGhpq2O12o0+fPkZBQUEtdVv3nW+99+3bZ0iqcnv//fdrr+k6rLrf3z9HyLk0F7Lef/nLX4w2bdoYAQEBRseOHY01a9bUQqfWUN16Hzp0yBg5cqQRHh5uBAQEGLfccovxzDPPmDfa48rxMQzDqM0rSQAAAFcC9+QAAABLIuQAAABLIuQAAABLIuQAAABLIuQAAABLIuQAAABLIuQAAABLIuQAAABLIuQAAABLIuQAAABLIuQAAABL+v9mZhae+l6+KwAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 640x480 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAjkAAAGzCAYAAADNKAZOAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjYuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/P9b71AAAACXBIWXMAAA9hAAAPYQGoP6dpAAA6tUlEQVR4nO3df1RVdb7/8Reg54A/DogKyKj4Mw1/FgadW5klIxrT1WqVmbfQzNKwlVJqfKcxp9sMprcZrUxv04w462apM1k3f0YoOCWaouSPiskGB0sPlCbHUAHh8/1jFvt6BBUURbbPx1p7xdmf9/7sz/6w47zcZ2/wM8YYAQAA2Ix/Yw8AAADgciDkAAAAWyLkAAAAWyLkAAAAWyLkAAAAWyLkAAAAWyLkAAAAWyLkAAAAWyLkAAAAWyLkAKjVgQMH5Ofnp/T09MYeimXIkCEaMmSI9fpKjjE9PV1+fn46cOCAta5Lly76xS9+cdn3LUlZWVny8/NTVlbWFdkfYAeEHADXnDfeeOOqCm9nuprHBjQ1fvztKgC1McaorKxMzZs3V0BAQGMPR5KsqzjVVzMudox9+/ZVu3bt6nVVpLKyUhUVFXI6nfLz85P0rys5ffv21erVq+vcz8WOraqqSuXl5XI4HPL359+nQF3wfwrQxJWWll6Wfv38/BQYGHjVBJzaXIkxVs9vQECAAgMDrYBzpfn7+yswMJCAA9QD/7cATcjs2bPl5+enL774Qg899JDatGmjW2+91Wr/n//5H8XExCgoKEihoaF68MEHdfDgwRr9LFy4UN26dVNQUJBiY2P1t7/9rc73u2zcuFG33XabWrZsqZCQEI0cOVJffvllrePcv3+/xo0bp5CQEAUHB2v8+PE6ceJEnY71zTffVPfu3X3GeLbaxujxeDR+/Hh17NhRTqdTHTp00MiRI617abp06aJ9+/YpOztbfn5+8vPzs467+r6b7OxsPfnkkwoLC1PHjh192s68J6faRx99pIEDByowMFDR0dF67733ap2Ps53d5/nGdq57clauXGl9z9u1a6f/+I//0HfffedTM27cOLVq1UrfffedRo0apVatWql9+/Z69tlnVVlZeY7vAND0NWvsAQCov/vvv189e/bUb3/7W1V/4vyb3/xGv/rVr/TAAw/oscce0/fff6/XXntNgwcP1q5duxQSEiJJWrRokaZMmaLbbrtN06ZN04EDBzRq1Ci1adPGekM/l48//lgjRoxQt27dNHv2bJ08eVKvvfaabrnlFu3cuVNdunTxqX/ggQfUtWtXpaWlaefOnXrrrbcUFhaml19++bz7+eMf/6gnnnhC//Zv/6apU6fqH//4h/793/9doaGh6tSp03m3ve+++7Rv3z499dRT6tKli4qLi5WRkaHCwkJ16dJF8+fP11NPPaVWrVrpl7/8pSQpPDzcp48nn3xS7du316xZsy54pezrr7/W6NGjNWnSJCUlJWnJkiW6//77tX79ev385z8/77Znq8vYzpSenq7x48frpptuUlpamoqKirRgwQJ9+umnPt9z6V8ftyUkJCguLk7/9V//pY8//livvPKKunfvrsmTJ9drnECTYQA0GS+88IKRZMaMGeOz/sCBAyYgIMD85je/8Vm/Z88e06xZM2t9WVmZadu2rbnppptMRUWFVZeenm4kmdtvv91aV1BQYCSZJUuWWOsGDhxowsLCzJEjR6x1n3/+ufH39zePPPJIjXE++uijPuO55557TNu2bc97jOXl5SYsLMwMHDjQlJWVWevffPPNC47xxx9/NJLMvHnzzruPPn36+PRTbcmSJUaSufXWW83p06drbSsoKLDWRUVFGUnmr3/9q7WupKTEdOjQwdxwww3Wuur5ONf+zuzzXGPbtGmTkWQ2bdpkjPm/eerbt685efKkVbd69WojycyaNctal5SUZCSZF1980afPG264wcTExNTYF2AXfFwFNEGTJk3yef3ee++pqqpKDzzwgH744QdriYiIUM+ePbVp0yZJ0o4dO3TkyBFNnDhRzZr934XcsWPHqk2bNufd5+HDh5WXl6dx48YpNDTUWt+/f3/9/Oc/19q1ay84zttuu01HjhyR1+s953527Nih4uJiTZo0SQ6Hw1o/btw4BQcHn3eMQUFBcjgcysrK0o8//nje2vOZOHFine/ziYyM1D333GO9drlceuSRR7Rr1y55PJ6LHsOFVM/Tk08+qcDAQGt9YmKievfurTVr1tTYprbvxz/+8Y/LNkagsRFygCaoa9euPq+//vprGWPUs2dPtW/f3mf58ssvVVxcLEn65z//KUnq0aOHz/bNmjWr8VHT2aq37dWrV42266+/Xj/88EONj3Y6d+7s87o6SJ0vgFTvp2fPnj7rmzdvrm7dup13jE6nUy+//LLWrVun8PBwDR48WHPnzq132Dh7fs+nR48eNe63ue666ySp1vt3Gsr5vh+9e/e22qsFBgaqffv2PuvatGlzSWEQuNpxTw7QBAUFBfm8rqqqkp+fn9atW1frFYhWrVpdqaH5ONfVEHMZf3PF1KlTdffdd+v999/Xhg0b9Ktf/UppaWnauHGjbrjhhjr1cfb8XqpzPZF1JW/6vZqfkgMuF67kADbQvXt3GWPUtWtXxcfH11huvvlmSVJUVJQkaf/+/T7bnz59+oJXHaq3zc/Pr9H21VdfqV27dmrZsuUlH0v1fr7++muf9RUVFSooKKhTH927d9czzzyjjz76SHv37lV5ebleeeUVq70hHwPfv39/jdD297//XZKsq2PVV7COHTvmU3f21Zb6jO1834/8/HyrHbiWEXIAG7j33nsVEBCgX//61zXecI0xOnLkiCRp0KBBatu2rf7whz/o9OnTVs3bb799wY8tOnTooIEDB2rp0qU+b9Z79+7VRx99pLvuuqtBjmXQoEFq3769Fi9erPLycmt9enp6jZBwthMnTujUqVM+67p3767WrVurrKzMWteyZcsL9lVXhw4d0qpVq6zXXq9Xf/7znzVw4EBFRERYY5CkzZs3W3WlpaVaunRpjf7qOrZBgwYpLCxMixcv9jm2devW6csvv1RiYuLFHhJgG3xcBdhA9+7d9dJLLyk1NdV6JLx169YqKCjQqlWr9Pjjj+vZZ5+Vw+HQ7Nmz9dRTT+nOO+/UAw88oAMHDig9PV3du3e/4FWEefPmacSIEXK73ZowYYL1CHlwcLBmz57dIMfSvHlzvfTSS3riiSd05513avTo0SooKNCSJUsueE/O3//+dw0dOlQPPPCAoqOj1axZM61atUpFRUV68MEHrbqYmBgtWrRIL730knr06KGwsDDdeeedFzXe6667ThMmTND27dsVHh6uP/3pTyoqKtKSJUusmmHDhqlz586aMGGCpk+froCAAP3pT39S+/btVVhY6NNfXcfWvHlzvfzyyxo/frxuv/12jRkzxnqEvEuXLpo2bdpFHQ9gK435aBeA+ql+FPn777+vtf2vf/2rufXWW03Lli1Ny5YtTe/evU1ycrLJz8/3qXv11VdNVFSUcTqdJjY21nz66acmJibGDB8+3Kqp7RFyY4z5+OOPzS233GKCgoKMy+Uyd999t/niiy/qNM7aHpk+lzfeeMN07drVOJ1OM2jQILN582Zz++23n/cR8h9++MEkJyeb3r17m5YtW5rg4GATFxdnVqxY4dO3x+MxiYmJpnXr1j6PpVePb/v27TXGc65HyBMTE82GDRtM//79jdPpNL179zYrV66ssX1ubq6Ji4szDofDdO7c2fzud7+rtc9zje3sR8irLV++3Nxwww3G6XSa0NBQM3bsWPPtt9/61CQlJZmWLVvWGNO5Hm0H7IK/XQVAVVVVat++ve6991794Q9/aOzhAECD4J4c4Bpz6tSpGvft/PnPf9bRo0d9/qwDADR1XMkBrjFZWVmaNm2a7r//frVt21Y7d+7UH//4R11//fXKzc31+QV8ANCUceMxcI3p0qWLOnXqpFdffVVHjx5VaGioHnnkEc2ZM4eAA8BWuJIDAABsiXtyAACALRFyAACALV3T9+RUVVXp0KFDat26dYP+mncAAHD5GGN0/PhxRUZGyt//3NdrrumQc+jQIXXq1KmxhwEAAC7CwYMH1bFjx3O21yvkLFq0SIsWLbL+kF+fPn00a9YsjRgxQpI0ZMgQZWdn+2zzxBNPaPHixdbrwsJCTZ48WZs2bVKrVq2UlJSktLQ0NWv2f0PJyspSSkqK9u3bp06dOun555/XuHHjfPpduHCh5s2bJ4/HowEDBui1115TbGxsfQ5HrVu3lvSvSXK5XPXaFgAANA6v16tOnTpZ7+PnUq+Q07FjR82ZM0c9e/aUMUZLly7VyJEjtWvXLvXp00eSNHHiRL344ovWNi1atLC+rqysVGJioiIiIrRlyxYdPnxYjzzyiJo3b67f/va3kqSCggIlJiZq0qRJevvtt5WZmanHHntMHTp0UEJCgiRp+fLlSklJ0eLFixUXF6f58+crISFB+fn5CgsLq/PxVH9E5XK5CDkAADQxF7rV5JIfIQ8NDdW8efM0YcIEDRkyRAMHDtT8+fNrrV23bp1+8Ytf6NChQwoPD5ckLV68WDNnztT3338vh8OhmTNnas2aNdq7d6+13YMPPqhjx45p/fr1kqS4uDjddNNNev311yX9696aTp066amnntJzzz13zrGWlZX5/LXe6iRYUlJCyAEAoInwer0KDg6+4Pv3RT9dVVlZqXfffVelpaVyu93W+rffflvt2rVT3759lZqaqhMnTlhtOTk56tevnxVwJCkhIUFer1f79u2zauLj4332lZCQoJycHElSeXm5cnNzfWr8/f0VHx9v1ZxLWlqagoODrYX7cQAAsK9633i8Z88eud1unTp1Sq1atdKqVasUHR0tSXrooYcUFRWlyMhI7d69WzNnzlR+fr7ee+89SZLH4/EJOJKs1x6P57w1Xq9XJ0+e1I8//qjKyspaa7766qvzjj01NVUpKSnW6+orOQAAwH7qHXJ69eqlvLw8lZSU6C9/+YuSkpKUnZ2t6OhoPf7441Zdv3791KFDBw0dOlTffPONunfv3qADvxhOp1NOp7OxhwEAAK6Aen9c5XA41KNHD8XExCgtLU0DBgzQggULaq2Ni4uTJO3fv1+SFBERoaKiIp+a6tcRERHnrXG5XAoKClK7du0UEBBQa011HwAAAJf8G4+rqqp8buY9U15eniSpQ4cOkiS32609e/aouLjYqsnIyJDL5bI+8nK73crMzPTpJyMjw7rvx+FwKCYmxqemqqpKmZmZPvcGAQCAa1u9Pq5KTU3ViBEj1LlzZx0/flzLli1TVlaWNmzYoG+++UbLli3TXXfdpbZt22r37t2aNm2aBg8erP79+0uShg0bpujoaD388MOaO3euPB6Pnn/+eSUnJ1sfI02aNEmvv/66ZsyYoUcffVQbN27UihUrtGbNGmscKSkpSkpK0qBBgxQbG6v58+ertLRU48ePb8CpAQAATZqph0cffdRERUUZh8Nh2rdvb4YOHWo++ugjY4wxhYWFZvDgwSY0NNQ4nU7To0cPM336dFNSUuLTx4EDB8yIESNMUFCQadeunXnmmWdMRUWFT82mTZvMwIEDjcPhMN26dTNLliypMZbXXnvNdO7c2TgcDhMbG2u2bt1an0MxxhhTUlJiJNUYIwAAuHrV9f37kn9PTlNW1+fsAQDA1eOy/54cAACAqxkhBwAA2BIhBwAA2BIhBwAA2BIhBwAA2BIh5zLp8tyaCxcBAIDLhpADAABsiZADAABsiZADAABsiZADAABsiZADAABsiZADAABsiZADAABsiZADAABsiZADAABsiZADAABsiZADAABsiZADAABsiZADAABsiZADAABsiZADAABsiZADAABsiZADAABsiZADAABsiZADAABsiZADAABsiZADAABsiZADAABsiZADAABsiZADAABsiZADAABsiZADAABsiZADAABsiZADAABsiZADAABsiZADAABsiZADAABsiZADAABsiZADAABsiZADAABsiZADAABsqV4hZ9GiRerfv79cLpdcLpfcbrfWrVtntZ86dUrJyclq27atWrVqpfvuu09FRUU+fRQWFioxMVEtWrRQWFiYpk+frtOnT/vUZGVl6cYbb5TT6VSPHj2Unp5eYywLFy5Uly5dFBgYqLi4OH322Wf1ORQAAGBz9Qo5HTt21Jw5c5Sbm6sdO3bozjvv1MiRI7Vv3z5J0rRp0/Thhx9q5cqVys7O1qFDh3Tvvfda21dWVioxMVHl5eXasmWLli5dqvT0dM2aNcuqKSgoUGJiou644w7l5eVp6tSpeuyxx7RhwwarZvny5UpJSdELL7ygnTt3asCAAUpISFBxcfGlzgcAALALc4natGlj3nrrLXPs2DHTvHlzs3LlSqvtyy+/NJJMTk6OMcaYtWvXGn9/f+PxeKyaRYsWGZfLZcrKyowxxsyYMcP06dPHZx+jR482CQkJ1uvY2FiTnJxsva6srDSRkZEmLS2tXmMvKSkxkkxJSUm9tquLqJmrG7xPAABQ9/fvi74np7KyUu+++65KS0vldruVm5uriooKxcfHWzW9e/dW586dlZOTI0nKyclRv379FB4ebtUkJCTI6/VaV4NycnJ8+qiuqe6jvLxcubm5PjX+/v6Kj4+3as6lrKxMXq/XZwEAAPZU75CzZ88etWrVSk6nU5MmTdKqVasUHR0tj8cjh8OhkJAQn/rw8HB5PB5Jksfj8Qk41e3Vbeer8Xq9OnnypH744QdVVlbWWlPdx7mkpaUpODjYWjp16lTfwwcAAE1EvUNOr169lJeXp23btmny5MlKSkrSF198cTnG1uBSU1NVUlJiLQcPHmzsIQEAgMukWX03cDgc6tGjhyQpJiZG27dv14IFCzR69GiVl5fr2LFjPldzioqKFBERIUmKiIio8RRU9dNXZ9ac/URWUVGRXC6XgoKCFBAQoICAgFprqvs4F6fTKafTWd9DBgAATdAl/56cqqoqlZWVKSYmRs2bN1dmZqbVlp+fr8LCQrndbkmS2+3Wnj17fJ6CysjIkMvlUnR0tFVzZh/VNdV9OBwOxcTE+NRUVVUpMzPTqgEAAKjXlZzU1FSNGDFCnTt31vHjx7Vs2TJlZWVpw4YNCg4O1oQJE5SSkqLQ0FC5XC499dRTcrvduvnmmyVJw4YNU3R0tB5++GHNnTtXHo9Hzz//vJKTk60rLJMmTdLrr7+uGTNm6NFHH9XGjRu1YsUKrVmzxhpHSkqKkpKSNGjQIMXGxmr+/PkqLS3V+PHjG3BqAABAU1avkFNcXKxHHnlEhw8fVnBwsPr3768NGzbo5z//uSTp97//vfz9/XXfffeprKxMCQkJeuONN6ztAwICtHr1ak2ePFlut1stW7ZUUlKSXnzxRauma9euWrNmjaZNm6YFCxaoY8eOeuutt5SQkGDVjB49Wt9//71mzZolj8ejgQMHav369TVuRgYAANcuP2OMaexBNBav16vg4GCVlJTI5XI1aN9dnlujA3MSG7RPAABQ9/dv/nYVAACwJUIOAACwJUIOAACwJUIOAACwJUIOAACwJUIOAACwJUIOAACwJUIOAACwJUIOAACwJUIOAACwJUIOAACwJUIOAACwJUIOAACwJUIOAACwJUIOAACwJUIOAACwJUIOAACwJUIOAACwJUIOAACwJUIOAACwJUIOAACwJUIOAACwJUIOAACwJUIOAACwJUIOAACwJUIOAACwJUIOAACwJUIOAACwJUIOAACwJUIOAACwJUIOAACwJUIOAACwJUIOAACwJUIOAACwJUIOAACwJUIOAACwJUIOAACwJUIOAACwJUIOAACwJUIOAACwJUIOAACwpXqFnLS0NN10001q3bq1wsLCNGrUKOXn5/vUDBkyRH5+fj7LpEmTfGoKCwuVmJioFi1aKCwsTNOnT9fp06d9arKysnTjjTfK6XSqR48eSk9PrzGehQsXqkuXLgoMDFRcXJw+++yz+hwOAACwsXqFnOzsbCUnJ2vr1q3KyMhQRUWFhg0bptLSUp+6iRMn6vDhw9Yyd+5cq62yslKJiYkqLy/Xli1btHTpUqWnp2vWrFlWTUFBgRITE3XHHXcoLy9PU6dO1WOPPaYNGzZYNcuXL1dKSopeeOEF7dy5UwMGDFBCQoKKi4svdi4AAICdmEtQXFxsJJns7Gxr3e23326efvrpc26zdu1a4+/vbzwej7Vu0aJFxuVymbKyMmOMMTNmzDB9+vTx2W706NEmISHBeh0bG2uSk5Ot15WVlSYyMtKkpaXVefwlJSVGkikpKanzNnUVNXN1g/cJAADq/v59SffklJSUSJJCQ0N91r/99ttq166d+vbtq9TUVJ04ccJqy8nJUb9+/RQeHm6tS0hIkNfr1b59+6ya+Ph4nz4TEhKUk5MjSSovL1dubq5Pjb+/v+Lj462a2pSVlcnr9fosAADAnppd7IZVVVWaOnWqbrnlFvXt29da/9BDDykqKkqRkZHavXu3Zs6cqfz8fL333nuSJI/H4xNwJFmvPR7PeWu8Xq9OnjypH3/8UZWVlbXWfPXVV+ccc1pamn79619f7CEDAIAm5KJDTnJysvbu3atPPvnEZ/3jjz9ufd2vXz916NBBQ4cO1TfffKPu3btf/EgbQGpqqlJSUqzXXq9XnTp1asQRAQCAy+WiQs6UKVO0evVqbd68WR07djxvbVxcnCRp//796t69uyIiImo8BVVUVCRJioiIsP5bve7MGpfLpaCgIAUEBCggIKDWmuo+auN0OuV0Out2kAAAoEmr1z05xhhNmTJFq1at0saNG9W1a9cLbpOXlydJ6tChgyTJ7XZrz549Pk9BZWRkyOVyKTo62qrJzMz06ScjI0Nut1uS5HA4FBMT41NTVVWlzMxMqwYAAFzb6nUlJzk5WcuWLdMHH3yg1q1bW/fQBAcHKygoSN98842WLVumu+66S23bttXu3bs1bdo0DR48WP3795ckDRs2TNHR0Xr44Yc1d+5ceTwePf/880pOTrauskyaNEmvv/66ZsyYoUcffVQbN27UihUrtGbNGmssKSkpSkpK0qBBgxQbG6v58+ertLRU48ePb6i5AQAATVl9HtmSVOuyZMkSY4wxhYWFZvDgwSY0NNQ4nU7To0cPM3369BqPeB04cMCMGDHCBAUFmXbt2plnnnnGVFRU+NRs2rTJDBw40DgcDtOtWzdrH2d67bXXTOfOnY3D4TCxsbFm69at9TkcHiEHAKAJquv7t58xxjRexGpcXq9XwcHBKikpkcvlatC+uzy3RgfmJDZonwAAoO7v3/ztKgAAYEuEHAAAYEuEHAAAYEuEHAAAYEuEHAAAYEuEHAAAYEuEHAAAYEuEHAAAYEuEHAAAYEuEHAAAYEuEHAAAYEuEHAAAYEuEHAAAYEuEHAAAYEuEHAAAYEuEHAAAYEuEHAAAYEuEHAAAYEuEHAAAYEuEHAAAYEuEHAAAYEuEHAAAYEuEHAAAYEuEHAAAYEuEHAAAYEuEHAAAYEuEHAAAYEuEHAAAYEuEHAAAYEuEHAAAYEuEHAAAYEuEHAAAYEuEHAAAYEuEHAAAYEuEHAAAYEuEHAAAYEuEHAAAYEuEHAAAYEuEHAAAYEuEHAAAYEuEHAAAYEv1CjlpaWm66aab1Lp1a4WFhWnUqFHKz8/3qTl16pSSk5PVtm1btWrVSvfdd5+Kiop8agoLC5WYmKgWLVooLCxM06dP1+nTp31qsrKydOONN8rpdKpHjx5KT0+vMZ6FCxeqS5cuCgwMVFxcnD777LP6HA4AALCxeoWc7OxsJScna+vWrcrIyFBFRYWGDRum0tJSq2batGn68MMPtXLlSmVnZ+vQoUO69957rfbKykolJiaqvLxcW7Zs0dKlS5Wenq5Zs2ZZNQUFBUpMTNQdd9yhvLw8TZ06VY899pg2bNhg1SxfvlwpKSl64YUXtHPnTg0YMEAJCQkqLi6+lPkAAAB2YS5BcXGxkWSys7ONMcYcO3bMNG/e3KxcudKq+fLLL40kk5OTY4wxZu3atcbf3994PB6rZtGiRcblcpmysjJjjDEzZswwffr08dnX6NGjTUJCgvU6NjbWJCcnW68rKytNZGSkSUtLq/P4S0pKjCRTUlJSj6Oum6iZqxu8TwAAUPf370u6J6ekpESSFBoaKknKzc1VRUWF4uPjrZrevXurc+fOysnJkSTl5OSoX79+Cg8Pt2oSEhLk9Xq1b98+q+bMPqprqvsoLy9Xbm6uT42/v7/i4+OtmtqUlZXJ6/X6LAAAwJ4uOuRUVVVp6tSpuuWWW9S3b19JksfjkcPhUEhIiE9teHi4PB6PVXNmwKlur247X43X69XJkyf1ww8/qLKystaa6j5qk5aWpuDgYGvp1KlT/Q8cAAA0CRcdcpKTk7V37169++67DTmeyyo1NVUlJSXWcvDgwcYeEgAAuEyaXcxGU6ZM0erVq7V582Z17NjRWh8REaHy8nIdO3bM52pOUVGRIiIirJqzn4KqfvrqzJqzn8gqKiqSy+VSUFCQAgICFBAQUGtNdR+1cTqdcjqd9T9gAADQ5NTrSo4xRlOmTNGqVau0ceNGde3a1ac9JiZGzZs3V2ZmprUuPz9fhYWFcrvdkiS32609e/b4PAWVkZEhl8ul6Ohoq+bMPqprqvtwOByKiYnxqamqqlJmZqZVAwAArm31upKTnJysZcuW6YMPPlDr1q2t+1+Cg4MVFBSk4OBgTZgwQSkpKQoNDZXL5dJTTz0lt9utm2++WZI0bNgwRUdH6+GHH9bcuXPl8Xj0/PPPKzk52brKMmnSJL3++uuaMWOGHn30UW3cuFErVqzQmjVrrLGkpKQoKSlJgwYNUmxsrObPn6/S0lKNHz++oeYGAAA0ZfV5ZEtSrcuSJUusmpMnT5onn3zStGnTxrRo0cLcc8895vDhwz79HDhwwIwYMcIEBQWZdu3amWeeecZUVFT41GzatMkMHDjQOBwO061bN599VHvttddM586djcPhMLGxsWbr1q31ORweIQcAoAmq6/u3nzHGNF7Ealxer1fBwcEqKSmRy+Vq0L67PLdGB+YkNmifAACg7u/f/O0qAABgS4QcAABgS4QcAABgS4QcAABgS4QcAABgS4QcAABgS4QcAABgS4QcAABgS4QcAABgS4QcAABgS4QcAABgS4QcAABgS4QcAABgS4QcAABgS4QcAABgS4QcAABgS4QcAABgS4QcAABgS4QcAABgS4QcAABgS4QcAABgS4QcAABgS4QcAABgS4QcAABgS4QcAABgS4QcAABgS4QcAABgS4QcAABgS4QcAABgS4QcAABgS4QcAABgS4QcAABgS4QcAABgS4QcAABgS4QcAABgS4QcAABgS4QcAABgS4QcAABgS4QcAABgS4QcAABgS4QcAABgS/UOOZs3b9bdd9+tyMhI+fn56f333/dpHzdunPz8/HyW4cOH+9QcPXpUY8eOlcvlUkhIiCZMmKCffvrJp2b37t267bbbFBgYqE6dOmnu3Lk1xrJy5Ur17t1bgYGB6tevn9auXVvfwwEAADZV75BTWlqqAQMGaOHCheesGT58uA4fPmwt77zzjk/72LFjtW/fPmVkZGj16tXavHmzHn/8cavd6/Vq2LBhioqKUm5urubNm6fZs2frzTfftGq2bNmiMWPGaMKECdq1a5dGjRqlUaNGae/evfU9JAAAYEN+xhhz0Rv7+WnVqlUaNWqUtW7cuHE6duxYjSs81b788ktFR0dr+/btGjRokCRp/fr1uuuuu/Ttt98qMjJSixYt0i9/+Ut5PB45HA5J0nPPPaf3339fX331lSRp9OjRKi0t1erVq62+b775Zg0cOFCLFy+u0/i9Xq+Cg4NVUlIil8t1ETNwbl2eW6MDcxIbtE8AAFD39+/Lck9OVlaWwsLC1KtXL02ePFlHjhyx2nJychQSEmIFHEmKj4+Xv7+/tm3bZtUMHjzYCjiSlJCQoPz8fP34449WTXx8vM9+ExISlJOTc85xlZWVyev1+iwAAMCeGjzkDB8+XH/+85+VmZmpl19+WdnZ2RoxYoQqKyslSR6PR2FhYT7bNGvWTKGhofJ4PFZNeHi4T0316wvVVLfXJi0tTcHBwdbSqVOnSztYAABw1WrW0B0++OCD1tf9+vVT//791b17d2VlZWno0KENvbt6SU1NVUpKivXa6/USdAAAsKnL/gh5t27d1K5dO+3fv1+SFBERoeLiYp+a06dP6+jRo4qIiLBqioqKfGqqX1+oprq9Nk6nUy6Xy2cBAAD2dNlDzrfffqsjR46oQ4cOkiS3261jx44pNzfXqtm4caOqqqoUFxdn1WzevFkVFRVWTUZGhnr16qU2bdpYNZmZmT77ysjIkNvtvtyHBAAAmoB6h5yffvpJeXl5ysvLkyQVFBQoLy9PhYWF+umnnzR9+nRt3bpVBw4cUGZmpkaOHKkePXooISFBknT99ddr+PDhmjhxoj777DN9+umnmjJlih588EFFRkZKkh566CE5HA5NmDBB+/bt0/Lly7VgwQKfj5qefvpprV+/Xq+88oq++uorzZ49Wzt27NCUKVMaYFoAAECTZ+pp06ZNRlKNJSkpyZw4ccIMGzbMtG/f3jRv3txERUWZiRMnGo/H49PHkSNHzJgxY0yrVq2My+Uy48ePN8ePH/ep+fzzz82tt95qnE6n+dnPfmbmzJlTYywrVqww1113nXE4HKZPnz5mzZo19TqWkpISI8mUlJTUdxouKGrm6gbvEwAA1P39+5J+T05Tx+/JAQCg6WnU35MDAADQ2Ag5AADAlgg5AADAlgg5AADAlgg5AADAlgg5AADAlgg5AADAlgg5AADAlgg5AADAlgg5AADAlgg5AADAlgg5AADAlgg5AADAlgg5AADAlgg5AADAlgg5V1CX59Y09hAAALhmEHIAAIAtEXIAAIAtEXIAAIAtEXIAAIAtEXIAAJeMBytwNSLkAAAAWyLkAAAAWyLkAAAAWyLkAAAAWyLkAAAAWyLkAAAAWyLkAAAAWyLkAAAAWyLkAAAAWyLkAAAAWyLkAAAAWyLkAAAAWyLkAAAAWyLkAAAAWyLkAAAAWyLkAAAAWyLkAAAAWyLkAAAAWyLkAAAAWyLkAAAAW6p3yNm8ebPuvvtuRUZGys/PT++//75PuzFGs2bNUocOHRQUFKT4+Hh9/fXXPjVHjx7V2LFj5XK5FBISogkTJuinn37yqdm9e7duu+02BQYGqlOnTpo7d26NsaxcuVK9e/dWYGCg+vXrp7Vr19b3cAAAgE3VO+SUlpZqwIABWrhwYa3tc+fO1auvvqrFixdr27ZtatmypRISEnTq1CmrZuzYsdq3b58yMjK0evVqbd68WY8//rjV7vV6NWzYMEVFRSk3N1fz5s3T7Nmz9eabb1o1W7Zs0ZgxYzRhwgTt2rVLo0aN0qhRo7R37976HhIAALAjcwkkmVWrVlmvq6qqTEREhJk3b5617tixY8bpdJp33nnHGGPMF198YSSZ7du3WzXr1q0zfn5+5rvvvjPGGPPGG2+YNm3amLKyMqtm5syZplevXtbrBx54wCQmJvqMJy4uzjzxxBN1Hn9JSYmRZEpKSuq8TV1FzVxdp3UAYAf8fMOVVNf37wa9J6egoEAej0fx8fHWuuDgYMXFxSknJ0eSlJOTo5CQEA0aNMiqiY+Pl7+/v7Zt22bVDB48WA6Hw6pJSEhQfn6+fvzxR6vmzP1U11TvpzZlZWXyer0+CwAAsKcGDTkej0eSFB4e7rM+PDzcavN4PAoLC/Npb9asmUJDQ31qauvjzH2cq6a6vTZpaWkKDg62lk6dOtX3EBtUl+fWNOr+AQCws2vq6arU1FSVlJRYy8GDBxt7SAAA4DJp0JATEREhSSoqKvJZX1RUZLVFRESouLjYp/306dM6evSoT01tfZy5j3PVVLfXxul0yuVy+SwAAMCeGjTkdO3aVREREcrMzLTWeb1ebdu2TW63W5Lkdrt17Ngx5ebmWjUbN25UVVWV4uLirJrNmzeroqLCqsnIyFCvXr3Upk0bq+bM/VTXVO8HAABc2+odcn766Sfl5eUpLy9P0r9uNs7Ly1NhYaH8/Pw0depUvfTSS/rf//1f7dmzR4888ogiIyM1atQoSdL111+v4cOHa+LEifrss8/06aefasqUKXrwwQcVGRkpSXrooYfkcDg0YcIE7du3T8uXL9eCBQuUkpJijePpp5/W+vXr9corr+irr77S7NmztWPHDk2ZMuXSZwUAADR99X1sa9OmTUZSjSUpKckY86/HyH/1q1+Z8PBw43Q6zdChQ01+fr5PH0eOHDFjxowxrVq1Mi6Xy4wfP94cP37cp+bzzz83t956q3E6neZnP/uZmTNnTo2xrFixwlx33XXG4XCYPn36mDVr1tTrWBr7EXIeuQRwNbmUn0n8PMOVVNf372b1DUVDhgyRMeac7X5+fnrxxRf14osvnrMmNDRUy5YtO+9++vfvr7/97W/nrbn//vt1//33n3/AAADgmnRNPV0FAACuHYQcAABgS4QcAABgS4QcAABgS4ScK4A/3wAAwJVHyAEAALZEyAEAALZEyAEAALZEyAEAALZEyAEAALZEyAEAALZEyAEAALZEyAEAALZEyAEAALZEyAEAALZEyAEAALZEyAEAALZEyAEAALZEyAEAALZEyAEAALZEyAEAALZEyAEAALZEyAEAALZEyAEAALZEyAEAALZEyAEAALZEyAEAALZEyAEAALZEyAEAALZEyAEAALZEyAEAALZEyAEAALZEyAEAALZEyAEAALZEyAEA1EmX59Y09hCAeiHkAABqINDADgg5AADAlgg5AADAlgg5AADAlgg5AADAlgg5AADAlho85MyePVt+fn4+S+/eva32U6dOKTk5WW3btlWrVq103333qaioyKePwsJCJSYmqkWLFgoLC9P06dN1+vRpn5qsrCzdeOONcjqd6tGjh9LT0xv6UAAAQBN2Wa7k9OnTR4cPH7aWTz75xGqbNm2aPvzwQ61cuVLZ2dk6dOiQ7r33Xqu9srJSiYmJKi8v15YtW7R06VKlp6dr1qxZVk1BQYESExN1xx13KC8vT1OnTtVjjz2mDRs2XI7DAQAATVCzy9Jps2aKiIiosb6kpER//OMftWzZMt15552SpCVLluj666/X1q1bdfPNN+ujjz7SF198oY8//ljh4eEaOHCg/vM//1MzZ87U7Nmz5XA4tHjxYnXt2lWvvPKKJOn666/XJ598ot///vdKSEi4HIcEAACamMtyJefrr79WZGSkunXrprFjx6qwsFCSlJubq4qKCsXHx1u1vXv3VufOnZWTkyNJysnJUb9+/RQeHm7VJCQkyOv1at++fVbNmX1U11T3cS5lZWXyer0+CwAAsKcGDzlxcXFKT0/X+vXrtWjRIhUUFOi2227T8ePH5fF45HA4FBIS4rNNeHi4PB6PJMnj8fgEnOr26rbz1Xi9Xp08efKcY0tLS1NwcLC1dOrU6VIPFwAAXKUa/OOqESNGWF/3799fcXFxioqK0ooVKxQUFNTQu6uX1NRUpaSkWK+9Xi9BBwAAm7rsj5CHhITouuuu0/79+xUREaHy8nIdO3bMp6aoqMi6hyciIqLG01bVry9U43K5zhuknE6nXC6XzwIAuDT8nStcrS57yPnpp5/0zTffqEOHDoqJiVHz5s2VmZlptefn56uwsFBut1uS5Ha7tWfPHhUXF1s1GRkZcrlcio6OtmrO7KO6proPAACABg85zz77rLKzs3XgwAFt2bJF99xzjwICAjRmzBgFBwdrwoQJSklJ0aZNm5Sbm6vx48fL7Xbr5ptvliQNGzZM0dHRevjhh/X5559rw4YNev7555WcnCyn0ylJmjRpkv7xj39oxowZ+uqrr/TGG29oxYoVmjZtWkMfDgAAaKIa/J6cb7/9VmPGjNGRI0fUvn173Xrrrdq6davat28vSfr9738vf39/3XfffSorK1NCQoLeeOMNa/uAgACtXr1akydPltvtVsuWLZWUlKQXX3zRqunatavWrFmjadOmacGCBerYsaPeeustHh8HAACWBg8577777nnbAwMDtXDhQi1cuPCcNVFRUVq7du15+xkyZIh27dp1UWMEAAD2x9+uAgBYujy35pJuJOYmZFxNCDkAAMCWCDkAAMCWCDkAAMCWCDkAAMCWCDkAgFpd6k3IQGMj5AAAAFsi5AAALgpXeXC1I+QAAABbIuQAAABbIuQAAOqMj6jQlBByAACALRFyAACALRFyAACALRFyAACALRFyAACALRFyAACALRFyAADnxWPjaKoIOQAAwJYIOQAAwJYIOQAAwJYIOQAAwJYIOQAAwJYIOQAAwJYIOQCABsUj55cX81t3hBwAAGBLhBwAAGBLhBwAAGBLhJwrjM9SAQC4Mgg5AADAlgg5AADAlgg5AADAlgg5AADAlgg5AADAlgg5AADAlgg5AADAlgg5AADAlgg5AIB64xeboikg5DQSfkAAAHB5EXIAAGhi6vMP5Wv5H9WEHAAAYEtNPuQsXLhQXbp0UWBgoOLi4vTZZ5819pAAAMBVoEmHnOXLlyslJUUvvPCCdu7cqQEDBighIUHFxcWNPTQAANDImnTI+d3vfqeJEydq/Pjxio6O1uLFi9WiRQv96U9/auyhAQBwTbsa7gVq1tgDuFjl5eXKzc1Vamqqtc7f31/x8fHKycmpdZuysjKVlZVZr0tKSiRJXq+3wcdXVXbC6rf666qyE9b+zvwaAK4G1T+XanO+n2H8bLuy6jvHZ74fXUmXc7/V/Rpjzl9omqjvvvvOSDJbtmzxWT99+nQTGxtb6zYvvPCCkcTCwsLCwsJig+XgwYPnzQpN9krOxUhNTVVKSor1uqqqSkePHlXbtm3l5+fXIPvwer3q1KmTDh48KJfL1SB92hnzVXfMVf0wX3XHXNUP81V3l2uujDE6fvy4IiMjz1vXZENOu3btFBAQoKKiIp/1RUVFioiIqHUbp9Mpp9Ppsy4kJOSyjM/lcnHy1wPzVXfMVf0wX3XHXNUP81V3l2OugoODL1jTZG88djgciomJUWZmprWuqqpKmZmZcrvdjTgyAABwNWiyV3IkKSUlRUlJSRo0aJBiY2M1f/58lZaWavz48Y09NAAA0MiadMgZPXq0vv/+e82aNUsej0cDBw7U+vXrFR4e3mhjcjqdeuGFF2p8LIbaMV91x1zVD/NVd8xV/TBfddfYc+VnzIWevwIAAGh6muw9OQAAAOdDyAEAALZEyAEAALZEyAEAALZEyAEAALZEyKmjzZs36+6771ZkZKT8/Pz0/vvv+7QbYzRr1ix16NBBQUFBio+P19dff+1Tc/ToUY0dO1Yul0shISGaMGGCfvrppyt4FFfGheZq3Lhx8vPz81mGDx/uU3OtzFVaWppuuukmtW7dWmFhYRo1apTy8/N9ak6dOqXk5GS1bdtWrVq10n333VfjN30XFhYqMTFRLVq0UFhYmKZPn67Tp09fyUO57OoyV0OGDKlxbk2aNMmn5lqYK0latGiR+vfvb/2mWbfbrXXr1lntnFf/50JzxXl1fnPmzJGfn5+mTp1qrbtazi9CTh2VlpZqwIABWrhwYa3tc+fO1auvvqrFixdr27ZtatmypRISEnTq1CmrZuzYsdq3b58yMjK0evVqbd68WY8//viVOoQr5kJzJUnDhw/X4cOHreWdd97xab9W5io7O1vJycnaunWrMjIyVFFRoWHDhqm0tNSqmTZtmj788EOtXLlS2dnZOnTokO69916rvbKyUomJiSovL9eWLVu0dOlSpaena9asWY1xSJdNXeZKkiZOnOhzbs2dO9dqu1bmSpI6duyoOXPmKDc3Vzt27NCdd96pkSNHat++fZI4r850obmSOK/OZfv27frv//5v9e/f32f9VXN+NcifBL/GSDKrVq2yXldVVZmIiAgzb948a92xY8eM0+k077zzjjHGmC+++MJIMtu3b7dq1q1bZ/z8/Mx33313xcZ+pZ09V8YYk5SUZEaOHHnOba7VuTLGmOLiYiPJZGdnG2P+dR41b97crFy50qr58ssvjSSTk5NjjDFm7dq1xt/f33g8Hqtm0aJFxuVymbKysit7AFfQ2XNljDG33367efrpp8+5zbU6V9XatGlj3nrrLc6rOqieK2M4r87l+PHjpmfPniYjI8Nnjq6m84srOQ2goKBAHo9H8fHx1rrg4GDFxcUpJydHkpSTk6OQkBANGjTIqomPj5e/v7+2bdt2xcfc2LKyshQWFqZevXpp8uTJOnLkiNV2Lc9VSUmJJCk0NFSSlJubq4qKCp9zq3fv3urcubPPudWvXz+f3/SdkJAgr9fr8y9Ruzl7rqq9/fbbateunfr27avU1FSdOHHCartW56qyslLvvvuuSktL5Xa7Oa/O4+y5qsZ5VVNycrISExN9ziPp6vq51aT/rMPVwuPxSFKNPycRHh5utXk8HoWFhfm0N2vWTKGhoVbNtWL48OG699571bVrV33zzTf6f//v/2nEiBHKyclRQEDANTtXVVVVmjp1qm655Rb17dtX0r/OG4fDoZCQEJ/as8+t2s696jY7qm2uJOmhhx5SVFSUIiMjtXv3bs2cOVP5+fl67733JF17c7Vnzx653W6dOnVKrVq10qpVqxQdHa28vDzOq7Oca64kzqvavPvuu9q5c6e2b99eo+1q+rlFyMEV9+CDD1pf9+vXT/3791f37t2VlZWloUOHNuLIGldycrL27t2rTz75pLGHctU711yded9Wv3791KFDBw0dOlTffPONunfvfqWH2eh69eqlvLw8lZSU6C9/+YuSkpKUnZ3d2MO6Kp1rrqKjozmvznLw4EE9/fTTysjIUGBgYGMP57z4uKoBRERESFKNO8eLioqstoiICBUXF/u0nz59WkePHrVqrlXdunVTu3bttH//fknX5lxNmTJFq1ev1qZNm9SxY0drfUREhMrLy3Xs2DGf+rPPrdrOveo2uznXXNUmLi5OknzOrWtprhwOh3r06KGYmBilpaVpwIABWrBgAedVLc41V7W51s+r3NxcFRcX68Ybb1SzZs3UrFkzZWdn69VXX1WzZs0UHh5+1ZxfhJwG0LVrV0VERCgzM9Na5/V6tW3bNuszXbfbrWPHjik3N9eq2bhxo6qqqqz/Ya5V3377rY4cOaIOHTpIurbmyhijKVOmaNWqVdq4caO6du3q0x4TE6PmzZv7nFv5+fkqLCz0Obf27NnjEwwzMjLkcrmsy+12cKG5qk1eXp4k+Zxb18JcnUtVVZXKyso4r+qgeq5qc62fV0OHDtWePXuUl5dnLYMGDdLYsWOtr6+a86vBbmG2uePHj5tdu3aZXbt2GUnmd7/7ndm1a5f55z//aYwxZs6cOSYkJMR88MEHZvfu3WbkyJGma9eu5uTJk1Yfw4cPNzfccIPZtm2b+eSTT0zPnj3NmDFjGuuQLpvzzdXx48fNs88+a3JyckxBQYH5+OOPzY033mh69uxpTp06ZfVxrczV5MmTTXBwsMnKyjKHDx+2lhMnTlg1kyZNMp07dzYbN240O3bsMG6327jdbqv99OnTpm/fvmbYsGEmLy/PrF+/3rRv396kpqY2xiFdNheaq/3795sXX3zR7NixwxQUFJgPPvjAdOvWzQwePNjq41qZK2OMee6550x2drYpKCgwu3fvNs8995zx8/MzH330kTGG8+pM55srzqu6OfsJtKvl/CLk1NGmTZuMpBpLUlKSMeZfj5H/6le/MuHh4cbpdJqhQ4ea/Px8nz6OHDlixowZY1q1amVcLpcZP368OX78eCMczeV1vrk6ceKEGTZsmGnfvr1p3ry5iYqKMhMnTvR5jNCYa2euapsnSWbJkiVWzcmTJ82TTz5p2rRpY1q0aGHuuecec/jwYZ9+Dhw4YEaMGGGCgoJMu3btzDPPPGMqKiqu8NFcXheaq8LCQjN48GATGhpqnE6n6dGjh5k+fbopKSnx6edamCtjjHn00UdNVFSUcTgcpn379mbo0KFWwDGG8+pM55srzqu6OTvkXC3nl58xxjTcdSEAAICrA/fkAAAAWyLkAAAAWyLkAAAAWyLkAAAAWyLkAAAAWyLkAAAAWyLkAAAAWyLkAAAAWyLkAAAAWyLkAAAAWyLkAAAAW/r/ITgXs6ZqQ9UAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 640x480 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "#Visualize client distribution across districts and regions\n",
    "for col in ['disrict','region']:\n",
    "    region = client_train.groupby([col])['client_id'].count()\n",
    "    plt.bar(x=region.index, height=region.values)\n",
    "    plt.title(col+' distribution')\n",
    "    plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "0-LOrxdIU7-h"
   },
   "source": [
    "## Feature Engineering"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "id": "aVypp4ahU9tq"
   },
   "outputs": [],
   "source": [
    "#convert the column invoice_date to date time format on both the invoice train and invoice test\n",
    "for df in [invoice_train,invoice_test]:\n",
    "    df['invoice_date'] = pd.to_datetime(df['invoice_date'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "id": "sW2W5i63VYyK"
   },
   "outputs": [],
   "source": [
    "#encode labels in categorical column\n",
    "d={\"ELEC\":0,\"GAZ\":1}\n",
    "invoice_train['counter_type']=invoice_train['counter_type'].map(d)\n",
    "invoice_test['counter_type']=invoice_test['counter_type'].map(d)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "id": "NldYnn0GWHAk"
   },
   "outputs": [],
   "source": [
    "#convert categorical columns to int for model\n",
    "client_train['client_catg'] = client_train['client_catg'].astype(int)\n",
    "client_train['disrict'] = client_train['disrict'].astype(int)\n",
    "\n",
    "client_test['client_catg'] = client_test['client_catg'].astype(int)\n",
    "client_test['disrict'] = client_test['disrict'].astype(int)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "id": "Bt7LJHhyXJji"
   },
   "outputs": [],
   "source": [
    "def aggregate_by_client_id(invoice_data):\n",
    "    aggs = {}\n",
    "    aggs['consommation_level_1'] = ['mean', 'sum', 'min', 'max']\n",
    "    aggs['consommation_level_2'] = ['mean', 'sum', 'min', 'max']\n",
    "    aggs['consommation_level_3'] = ['mean', 'sum', 'min', 'max']\n",
    "    aggs['consommation_level_4'] = ['mean', 'sum', 'min', 'max']\n",
    "    aggs['reading_remarque'] = ['mean', 'min', 'max']\n",
    "    aggs['counter_code'] = ['mean', 'min', 'max']\n",
    "    aggs['old_index'] = ['min','max']\n",
    "    aggs['new_index'] = ['min','max']\n",
    "\n",
    "\n",
    "\n",
    "    agg_trans = invoice_data.groupby(['client_id']).agg(aggs)\n",
    "    agg_trans.columns = ['_'.join(col).strip() for col in agg_trans.columns.values]\n",
    "    agg_trans.reset_index(inplace=True)\n",
    "\n",
    "    df = (invoice_data.groupby('client_id')\n",
    "            .size()\n",
    "            .reset_index(name='{}transactions_count'.format('1')))\n",
    "    return pd.merge(df, agg_trans, on='client_id', how='left')\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "id": "8gUjxEmKWZjp"
   },
   "outputs": [],
   "source": [
    "#group invoice data by client_id\n",
    "agg_train = aggregate_by_client_id(invoice_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>client_id</th>\n",
       "      <th>1transactions_count</th>\n",
       "      <th>consommation_level_1_mean</th>\n",
       "      <th>consommation_level_1_sum</th>\n",
       "      <th>consommation_level_1_min</th>\n",
       "      <th>consommation_level_1_max</th>\n",
       "      <th>consommation_level_2_mean</th>\n",
       "      <th>consommation_level_2_sum</th>\n",
       "      <th>consommation_level_2_min</th>\n",
       "      <th>consommation_level_2_max</th>\n",
       "      <th>consommation_level_3_mean</th>\n",
       "      <th>consommation_level_3_sum</th>\n",
       "      <th>consommation_level_3_min</th>\n",
       "      <th>consommation_level_3_max</th>\n",
       "      <th>consommation_level_4_mean</th>\n",
       "      <th>consommation_level_4_sum</th>\n",
       "      <th>consommation_level_4_min</th>\n",
       "      <th>consommation_level_4_max</th>\n",
       "      <th>reading_remarque_mean</th>\n",
       "      <th>reading_remarque_min</th>\n",
       "      <th>reading_remarque_max</th>\n",
       "      <th>counter_code_mean</th>\n",
       "      <th>counter_code_min</th>\n",
       "      <th>counter_code_max</th>\n",
       "      <th>old_index_min</th>\n",
       "      <th>old_index_max</th>\n",
       "      <th>new_index_min</th>\n",
       "      <th>new_index_max</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>train_Client_0</td>\n",
       "      <td>35</td>\n",
       "      <td>352.400000</td>\n",
       "      <td>12334</td>\n",
       "      <td>38</td>\n",
       "      <td>1200</td>\n",
       "      <td>10.571429</td>\n",
       "      <td>370</td>\n",
       "      <td>0</td>\n",
       "      <td>186</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>6.971429</td>\n",
       "      <td>6</td>\n",
       "      <td>9</td>\n",
       "      <td>203.685714</td>\n",
       "      <td>203</td>\n",
       "      <td>207</td>\n",
       "      <td>3685</td>\n",
       "      <td>16493</td>\n",
       "      <td>3809</td>\n",
       "      <td>17078</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>train_Client_1</td>\n",
       "      <td>37</td>\n",
       "      <td>557.540541</td>\n",
       "      <td>20629</td>\n",
       "      <td>190</td>\n",
       "      <td>1207</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>7.216216</td>\n",
       "      <td>6</td>\n",
       "      <td>9</td>\n",
       "      <td>203.000000</td>\n",
       "      <td>203</td>\n",
       "      <td>203</td>\n",
       "      <td>4110</td>\n",
       "      <td>23940</td>\n",
       "      <td>4661</td>\n",
       "      <td>25022</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>train_Client_10</td>\n",
       "      <td>18</td>\n",
       "      <td>798.611111</td>\n",
       "      <td>14375</td>\n",
       "      <td>188</td>\n",
       "      <td>2400</td>\n",
       "      <td>37.888889</td>\n",
       "      <td>682</td>\n",
       "      <td>0</td>\n",
       "      <td>682</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>7.055556</td>\n",
       "      <td>6</td>\n",
       "      <td>9</td>\n",
       "      <td>203.222222</td>\n",
       "      <td>203</td>\n",
       "      <td>207</td>\n",
       "      <td>25515</td>\n",
       "      <td>41532</td>\n",
       "      <td>25974</td>\n",
       "      <td>44614</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>train_Client_100</td>\n",
       "      <td>20</td>\n",
       "      <td>1.200000</td>\n",
       "      <td>24</td>\n",
       "      <td>0</td>\n",
       "      <td>15</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>6.150000</td>\n",
       "      <td>6</td>\n",
       "      <td>9</td>\n",
       "      <td>413.000000</td>\n",
       "      <td>413</td>\n",
       "      <td>413</td>\n",
       "      <td>90</td>\n",
       "      <td>99</td>\n",
       "      <td>90</td>\n",
       "      <td>114</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>train_Client_1000</td>\n",
       "      <td>14</td>\n",
       "      <td>663.714286</td>\n",
       "      <td>9292</td>\n",
       "      <td>124</td>\n",
       "      <td>800</td>\n",
       "      <td>104.857143</td>\n",
       "      <td>1468</td>\n",
       "      <td>0</td>\n",
       "      <td>400</td>\n",
       "      <td>117.357143</td>\n",
       "      <td>1643</td>\n",
       "      <td>0</td>\n",
       "      <td>800</td>\n",
       "      <td>36.714286</td>\n",
       "      <td>514</td>\n",
       "      <td>0</td>\n",
       "      <td>382</td>\n",
       "      <td>8.857143</td>\n",
       "      <td>8</td>\n",
       "      <td>9</td>\n",
       "      <td>207.000000</td>\n",
       "      <td>207</td>\n",
       "      <td>207</td>\n",
       "      <td>0</td>\n",
       "      <td>13337</td>\n",
       "      <td>959</td>\n",
       "      <td>13729</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           client_id  1transactions_count  consommation_level_1_mean  \\\n",
       "0     train_Client_0                   35                 352.400000   \n",
       "1     train_Client_1                   37                 557.540541   \n",
       "2    train_Client_10                   18                 798.611111   \n",
       "3   train_Client_100                   20                   1.200000   \n",
       "4  train_Client_1000                   14                 663.714286   \n",
       "\n",
       "   consommation_level_1_sum  consommation_level_1_min  \\\n",
       "0                     12334                        38   \n",
       "1                     20629                       190   \n",
       "2                     14375                       188   \n",
       "3                        24                         0   \n",
       "4                      9292                       124   \n",
       "\n",
       "   consommation_level_1_max  consommation_level_2_mean  \\\n",
       "0                      1200                  10.571429   \n",
       "1                      1207                   0.000000   \n",
       "2                      2400                  37.888889   \n",
       "3                        15                   0.000000   \n",
       "4                       800                 104.857143   \n",
       "\n",
       "   consommation_level_2_sum  consommation_level_2_min  \\\n",
       "0                       370                         0   \n",
       "1                         0                         0   \n",
       "2                       682                         0   \n",
       "3                         0                         0   \n",
       "4                      1468                         0   \n",
       "\n",
       "   consommation_level_2_max  consommation_level_3_mean  \\\n",
       "0                       186                   0.000000   \n",
       "1                         0                   0.000000   \n",
       "2                       682                   0.000000   \n",
       "3                         0                   0.000000   \n",
       "4                       400                 117.357143   \n",
       "\n",
       "   consommation_level_3_sum  consommation_level_3_min  \\\n",
       "0                         0                         0   \n",
       "1                         0                         0   \n",
       "2                         0                         0   \n",
       "3                         0                         0   \n",
       "4                      1643                         0   \n",
       "\n",
       "   consommation_level_3_max  consommation_level_4_mean  \\\n",
       "0                         0                   0.000000   \n",
       "1                         0                   0.000000   \n",
       "2                         0                   0.000000   \n",
       "3                         0                   0.000000   \n",
       "4                       800                  36.714286   \n",
       "\n",
       "   consommation_level_4_sum  consommation_level_4_min  \\\n",
       "0                         0                         0   \n",
       "1                         0                         0   \n",
       "2                         0                         0   \n",
       "3                         0                         0   \n",
       "4                       514                         0   \n",
       "\n",
       "   consommation_level_4_max  reading_remarque_mean  reading_remarque_min  \\\n",
       "0                         0               6.971429                     6   \n",
       "1                         0               7.216216                     6   \n",
       "2                         0               7.055556                     6   \n",
       "3                         0               6.150000                     6   \n",
       "4                       382               8.857143                     8   \n",
       "\n",
       "   reading_remarque_max  counter_code_mean  counter_code_min  \\\n",
       "0                     9         203.685714               203   \n",
       "1                     9         203.000000               203   \n",
       "2                     9         203.222222               203   \n",
       "3                     9         413.000000               413   \n",
       "4                     9         207.000000               207   \n",
       "\n",
       "   counter_code_max  old_index_min  old_index_max  new_index_min  \\\n",
       "0               207           3685          16493           3809   \n",
       "1               203           4110          23940           4661   \n",
       "2               207          25515          41532          25974   \n",
       "3               413             90             99             90   \n",
       "4               207              0          13337            959   \n",
       "\n",
       "   new_index_max  \n",
       "0          17078  \n",
       "1          25022  \n",
       "2          44614  \n",
       "3            114  \n",
       "4          13729  "
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "agg_train.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 224
    },
    "id": "5cIpY1tGWrkw",
    "outputId": "c57de3d5-3efe-4e8a-9a0c-7362f05fca4c"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(135493, 28)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>client_id</th>\n",
       "      <th>1transactions_count</th>\n",
       "      <th>consommation_level_1_mean</th>\n",
       "      <th>consommation_level_1_sum</th>\n",
       "      <th>consommation_level_1_min</th>\n",
       "      <th>consommation_level_1_max</th>\n",
       "      <th>consommation_level_2_mean</th>\n",
       "      <th>consommation_level_2_sum</th>\n",
       "      <th>consommation_level_2_min</th>\n",
       "      <th>consommation_level_2_max</th>\n",
       "      <th>consommation_level_3_mean</th>\n",
       "      <th>consommation_level_3_sum</th>\n",
       "      <th>consommation_level_3_min</th>\n",
       "      <th>consommation_level_3_max</th>\n",
       "      <th>consommation_level_4_mean</th>\n",
       "      <th>consommation_level_4_sum</th>\n",
       "      <th>consommation_level_4_min</th>\n",
       "      <th>consommation_level_4_max</th>\n",
       "      <th>reading_remarque_mean</th>\n",
       "      <th>reading_remarque_min</th>\n",
       "      <th>reading_remarque_max</th>\n",
       "      <th>counter_code_mean</th>\n",
       "      <th>counter_code_min</th>\n",
       "      <th>counter_code_max</th>\n",
       "      <th>old_index_min</th>\n",
       "      <th>old_index_max</th>\n",
       "      <th>new_index_min</th>\n",
       "      <th>new_index_max</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>train_Client_0</td>\n",
       "      <td>35</td>\n",
       "      <td>352.400000</td>\n",
       "      <td>12334</td>\n",
       "      <td>38</td>\n",
       "      <td>1200</td>\n",
       "      <td>10.571429</td>\n",
       "      <td>370</td>\n",
       "      <td>0</td>\n",
       "      <td>186</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>6.971429</td>\n",
       "      <td>6</td>\n",
       "      <td>9</td>\n",
       "      <td>203.685714</td>\n",
       "      <td>203</td>\n",
       "      <td>207</td>\n",
       "      <td>3685</td>\n",
       "      <td>16493</td>\n",
       "      <td>3809</td>\n",
       "      <td>17078</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>train_Client_1</td>\n",
       "      <td>37</td>\n",
       "      <td>557.540541</td>\n",
       "      <td>20629</td>\n",
       "      <td>190</td>\n",
       "      <td>1207</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>7.216216</td>\n",
       "      <td>6</td>\n",
       "      <td>9</td>\n",
       "      <td>203.000000</td>\n",
       "      <td>203</td>\n",
       "      <td>203</td>\n",
       "      <td>4110</td>\n",
       "      <td>23940</td>\n",
       "      <td>4661</td>\n",
       "      <td>25022</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>train_Client_10</td>\n",
       "      <td>18</td>\n",
       "      <td>798.611111</td>\n",
       "      <td>14375</td>\n",
       "      <td>188</td>\n",
       "      <td>2400</td>\n",
       "      <td>37.888889</td>\n",
       "      <td>682</td>\n",
       "      <td>0</td>\n",
       "      <td>682</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>7.055556</td>\n",
       "      <td>6</td>\n",
       "      <td>9</td>\n",
       "      <td>203.222222</td>\n",
       "      <td>203</td>\n",
       "      <td>207</td>\n",
       "      <td>25515</td>\n",
       "      <td>41532</td>\n",
       "      <td>25974</td>\n",
       "      <td>44614</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>train_Client_100</td>\n",
       "      <td>20</td>\n",
       "      <td>1.200000</td>\n",
       "      <td>24</td>\n",
       "      <td>0</td>\n",
       "      <td>15</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>6.150000</td>\n",
       "      <td>6</td>\n",
       "      <td>9</td>\n",
       "      <td>413.000000</td>\n",
       "      <td>413</td>\n",
       "      <td>413</td>\n",
       "      <td>90</td>\n",
       "      <td>99</td>\n",
       "      <td>90</td>\n",
       "      <td>114</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>train_Client_1000</td>\n",
       "      <td>14</td>\n",
       "      <td>663.714286</td>\n",
       "      <td>9292</td>\n",
       "      <td>124</td>\n",
       "      <td>800</td>\n",
       "      <td>104.857143</td>\n",
       "      <td>1468</td>\n",
       "      <td>0</td>\n",
       "      <td>400</td>\n",
       "      <td>117.357143</td>\n",
       "      <td>1643</td>\n",
       "      <td>0</td>\n",
       "      <td>800</td>\n",
       "      <td>36.714286</td>\n",
       "      <td>514</td>\n",
       "      <td>0</td>\n",
       "      <td>382</td>\n",
       "      <td>8.857143</td>\n",
       "      <td>8</td>\n",
       "      <td>9</td>\n",
       "      <td>207.000000</td>\n",
       "      <td>207</td>\n",
       "      <td>207</td>\n",
       "      <td>0</td>\n",
       "      <td>13337</td>\n",
       "      <td>959</td>\n",
       "      <td>13729</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           client_id  1transactions_count  consommation_level_1_mean  \\\n",
       "0     train_Client_0                   35                 352.400000   \n",
       "1     train_Client_1                   37                 557.540541   \n",
       "2    train_Client_10                   18                 798.611111   \n",
       "3   train_Client_100                   20                   1.200000   \n",
       "4  train_Client_1000                   14                 663.714286   \n",
       "\n",
       "   consommation_level_1_sum  consommation_level_1_min  \\\n",
       "0                     12334                        38   \n",
       "1                     20629                       190   \n",
       "2                     14375                       188   \n",
       "3                        24                         0   \n",
       "4                      9292                       124   \n",
       "\n",
       "   consommation_level_1_max  consommation_level_2_mean  \\\n",
       "0                      1200                  10.571429   \n",
       "1                      1207                   0.000000   \n",
       "2                      2400                  37.888889   \n",
       "3                        15                   0.000000   \n",
       "4                       800                 104.857143   \n",
       "\n",
       "   consommation_level_2_sum  consommation_level_2_min  \\\n",
       "0                       370                         0   \n",
       "1                         0                         0   \n",
       "2                       682                         0   \n",
       "3                         0                         0   \n",
       "4                      1468                         0   \n",
       "\n",
       "   consommation_level_2_max  consommation_level_3_mean  \\\n",
       "0                       186                   0.000000   \n",
       "1                         0                   0.000000   \n",
       "2                       682                   0.000000   \n",
       "3                         0                   0.000000   \n",
       "4                       400                 117.357143   \n",
       "\n",
       "   consommation_level_3_sum  consommation_level_3_min  \\\n",
       "0                         0                         0   \n",
       "1                         0                         0   \n",
       "2                         0                         0   \n",
       "3                         0                         0   \n",
       "4                      1643                         0   \n",
       "\n",
       "   consommation_level_3_max  consommation_level_4_mean  \\\n",
       "0                         0                   0.000000   \n",
       "1                         0                   0.000000   \n",
       "2                         0                   0.000000   \n",
       "3                         0                   0.000000   \n",
       "4                       800                  36.714286   \n",
       "\n",
       "   consommation_level_4_sum  consommation_level_4_min  \\\n",
       "0                         0                         0   \n",
       "1                         0                         0   \n",
       "2                         0                         0   \n",
       "3                         0                         0   \n",
       "4                       514                         0   \n",
       "\n",
       "   consommation_level_4_max  reading_remarque_mean  reading_remarque_min  \\\n",
       "0                         0               6.971429                     6   \n",
       "1                         0               7.216216                     6   \n",
       "2                         0               7.055556                     6   \n",
       "3                         0               6.150000                     6   \n",
       "4                       382               8.857143                     8   \n",
       "\n",
       "   reading_remarque_max  counter_code_mean  counter_code_min  \\\n",
       "0                     9         203.685714               203   \n",
       "1                     9         203.000000               203   \n",
       "2                     9         203.222222               203   \n",
       "3                     9         413.000000               413   \n",
       "4                     9         207.000000               207   \n",
       "\n",
       "   counter_code_max  old_index_min  old_index_max  new_index_min  \\\n",
       "0               207           3685          16493           3809   \n",
       "1               203           4110          23940           4661   \n",
       "2               207          25515          41532          25974   \n",
       "3               413             90             99             90   \n",
       "4               207              0          13337            959   \n",
       "\n",
       "   new_index_max  \n",
       "0          17078  \n",
       "1          25022  \n",
       "2          44614  \n",
       "3            114  \n",
       "4          13729  "
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "print(agg_train.shape)\n",
    "agg_train.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "id": "beXOMOEMW97l"
   },
   "outputs": [],
   "source": [
    "#merge aggregate data with client dataset\n",
    "train = pd.merge(client_train,agg_train, on='client_id', how='left')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>disrict</th>\n",
       "      <th>client_id</th>\n",
       "      <th>client_catg</th>\n",
       "      <th>region</th>\n",
       "      <th>target</th>\n",
       "      <th>creation_year</th>\n",
       "      <th>creation_month</th>\n",
       "      <th>creation_day</th>\n",
       "      <th>1transactions_count</th>\n",
       "      <th>consommation_level_1_mean</th>\n",
       "      <th>consommation_level_1_sum</th>\n",
       "      <th>consommation_level_1_min</th>\n",
       "      <th>consommation_level_1_max</th>\n",
       "      <th>consommation_level_2_mean</th>\n",
       "      <th>consommation_level_2_sum</th>\n",
       "      <th>consommation_level_2_min</th>\n",
       "      <th>consommation_level_2_max</th>\n",
       "      <th>consommation_level_3_mean</th>\n",
       "      <th>consommation_level_3_sum</th>\n",
       "      <th>consommation_level_3_min</th>\n",
       "      <th>consommation_level_3_max</th>\n",
       "      <th>consommation_level_4_mean</th>\n",
       "      <th>consommation_level_4_sum</th>\n",
       "      <th>consommation_level_4_min</th>\n",
       "      <th>consommation_level_4_max</th>\n",
       "      <th>reading_remarque_mean</th>\n",
       "      <th>reading_remarque_min</th>\n",
       "      <th>reading_remarque_max</th>\n",
       "      <th>counter_code_mean</th>\n",
       "      <th>counter_code_min</th>\n",
       "      <th>counter_code_max</th>\n",
       "      <th>old_index_min</th>\n",
       "      <th>old_index_max</th>\n",
       "      <th>new_index_min</th>\n",
       "      <th>new_index_max</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>60</td>\n",
       "      <td>train_Client_0</td>\n",
       "      <td>11</td>\n",
       "      <td>101</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1994</td>\n",
       "      <td>12</td>\n",
       "      <td>31</td>\n",
       "      <td>35</td>\n",
       "      <td>352.400000</td>\n",
       "      <td>12334</td>\n",
       "      <td>38</td>\n",
       "      <td>1200</td>\n",
       "      <td>10.571429</td>\n",
       "      <td>370</td>\n",
       "      <td>0</td>\n",
       "      <td>186</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>6.971429</td>\n",
       "      <td>6</td>\n",
       "      <td>9</td>\n",
       "      <td>203.685714</td>\n",
       "      <td>203</td>\n",
       "      <td>207</td>\n",
       "      <td>3685</td>\n",
       "      <td>16493</td>\n",
       "      <td>3809</td>\n",
       "      <td>17078</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>69</td>\n",
       "      <td>train_Client_1</td>\n",
       "      <td>11</td>\n",
       "      <td>107</td>\n",
       "      <td>0.0</td>\n",
       "      <td>2002</td>\n",
       "      <td>5</td>\n",
       "      <td>29</td>\n",
       "      <td>37</td>\n",
       "      <td>557.540541</td>\n",
       "      <td>20629</td>\n",
       "      <td>190</td>\n",
       "      <td>1207</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>7.216216</td>\n",
       "      <td>6</td>\n",
       "      <td>9</td>\n",
       "      <td>203.000000</td>\n",
       "      <td>203</td>\n",
       "      <td>203</td>\n",
       "      <td>4110</td>\n",
       "      <td>23940</td>\n",
       "      <td>4661</td>\n",
       "      <td>25022</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>62</td>\n",
       "      <td>train_Client_10</td>\n",
       "      <td>11</td>\n",
       "      <td>301</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1986</td>\n",
       "      <td>3</td>\n",
       "      <td>13</td>\n",
       "      <td>18</td>\n",
       "      <td>798.611111</td>\n",
       "      <td>14375</td>\n",
       "      <td>188</td>\n",
       "      <td>2400</td>\n",
       "      <td>37.888889</td>\n",
       "      <td>682</td>\n",
       "      <td>0</td>\n",
       "      <td>682</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>7.055556</td>\n",
       "      <td>6</td>\n",
       "      <td>9</td>\n",
       "      <td>203.222222</td>\n",
       "      <td>203</td>\n",
       "      <td>207</td>\n",
       "      <td>25515</td>\n",
       "      <td>41532</td>\n",
       "      <td>25974</td>\n",
       "      <td>44614</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>69</td>\n",
       "      <td>train_Client_100</td>\n",
       "      <td>11</td>\n",
       "      <td>105</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1996</td>\n",
       "      <td>7</td>\n",
       "      <td>11</td>\n",
       "      <td>20</td>\n",
       "      <td>1.200000</td>\n",
       "      <td>24</td>\n",
       "      <td>0</td>\n",
       "      <td>15</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>6.150000</td>\n",
       "      <td>6</td>\n",
       "      <td>9</td>\n",
       "      <td>413.000000</td>\n",
       "      <td>413</td>\n",
       "      <td>413</td>\n",
       "      <td>90</td>\n",
       "      <td>99</td>\n",
       "      <td>90</td>\n",
       "      <td>114</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>62</td>\n",
       "      <td>train_Client_1000</td>\n",
       "      <td>11</td>\n",
       "      <td>303</td>\n",
       "      <td>0.0</td>\n",
       "      <td>2014</td>\n",
       "      <td>10</td>\n",
       "      <td>14</td>\n",
       "      <td>14</td>\n",
       "      <td>663.714286</td>\n",
       "      <td>9292</td>\n",
       "      <td>124</td>\n",
       "      <td>800</td>\n",
       "      <td>104.857143</td>\n",
       "      <td>1468</td>\n",
       "      <td>0</td>\n",
       "      <td>400</td>\n",
       "      <td>117.357143</td>\n",
       "      <td>1643</td>\n",
       "      <td>0</td>\n",
       "      <td>800</td>\n",
       "      <td>36.714286</td>\n",
       "      <td>514</td>\n",
       "      <td>0</td>\n",
       "      <td>382</td>\n",
       "      <td>8.857143</td>\n",
       "      <td>8</td>\n",
       "      <td>9</td>\n",
       "      <td>207.000000</td>\n",
       "      <td>207</td>\n",
       "      <td>207</td>\n",
       "      <td>0</td>\n",
       "      <td>13337</td>\n",
       "      <td>959</td>\n",
       "      <td>13729</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   disrict          client_id  client_catg  region  target  creation_year  \\\n",
       "0       60     train_Client_0           11     101     0.0           1994   \n",
       "1       69     train_Client_1           11     107     0.0           2002   \n",
       "2       62    train_Client_10           11     301     0.0           1986   \n",
       "3       69   train_Client_100           11     105     0.0           1996   \n",
       "4       62  train_Client_1000           11     303     0.0           2014   \n",
       "\n",
       "   creation_month  creation_day  1transactions_count  \\\n",
       "0              12            31                   35   \n",
       "1               5            29                   37   \n",
       "2               3            13                   18   \n",
       "3               7            11                   20   \n",
       "4              10            14                   14   \n",
       "\n",
       "   consommation_level_1_mean  consommation_level_1_sum  \\\n",
       "0                 352.400000                     12334   \n",
       "1                 557.540541                     20629   \n",
       "2                 798.611111                     14375   \n",
       "3                   1.200000                        24   \n",
       "4                 663.714286                      9292   \n",
       "\n",
       "   consommation_level_1_min  consommation_level_1_max  \\\n",
       "0                        38                      1200   \n",
       "1                       190                      1207   \n",
       "2                       188                      2400   \n",
       "3                         0                        15   \n",
       "4                       124                       800   \n",
       "\n",
       "   consommation_level_2_mean  consommation_level_2_sum  \\\n",
       "0                  10.571429                       370   \n",
       "1                   0.000000                         0   \n",
       "2                  37.888889                       682   \n",
       "3                   0.000000                         0   \n",
       "4                 104.857143                      1468   \n",
       "\n",
       "   consommation_level_2_min  consommation_level_2_max  \\\n",
       "0                         0                       186   \n",
       "1                         0                         0   \n",
       "2                         0                       682   \n",
       "3                         0                         0   \n",
       "4                         0                       400   \n",
       "\n",
       "   consommation_level_3_mean  consommation_level_3_sum  \\\n",
       "0                   0.000000                         0   \n",
       "1                   0.000000                         0   \n",
       "2                   0.000000                         0   \n",
       "3                   0.000000                         0   \n",
       "4                 117.357143                      1643   \n",
       "\n",
       "   consommation_level_3_min  consommation_level_3_max  \\\n",
       "0                         0                         0   \n",
       "1                         0                         0   \n",
       "2                         0                         0   \n",
       "3                         0                         0   \n",
       "4                         0                       800   \n",
       "\n",
       "   consommation_level_4_mean  consommation_level_4_sum  \\\n",
       "0                   0.000000                         0   \n",
       "1                   0.000000                         0   \n",
       "2                   0.000000                         0   \n",
       "3                   0.000000                         0   \n",
       "4                  36.714286                       514   \n",
       "\n",
       "   consommation_level_4_min  consommation_level_4_max  reading_remarque_mean  \\\n",
       "0                         0                         0               6.971429   \n",
       "1                         0                         0               7.216216   \n",
       "2                         0                         0               7.055556   \n",
       "3                         0                         0               6.150000   \n",
       "4                         0                       382               8.857143   \n",
       "\n",
       "   reading_remarque_min  reading_remarque_max  counter_code_mean  \\\n",
       "0                     6                     9         203.685714   \n",
       "1                     6                     9         203.000000   \n",
       "2                     6                     9         203.222222   \n",
       "3                     6                     9         413.000000   \n",
       "4                     8                     9         207.000000   \n",
       "\n",
       "   counter_code_min  counter_code_max  old_index_min  old_index_max  \\\n",
       "0               203               207           3685          16493   \n",
       "1               203               203           4110          23940   \n",
       "2               203               207          25515          41532   \n",
       "3               413               413             90             99   \n",
       "4               207               207              0          13337   \n",
       "\n",
       "   new_index_min  new_index_max  \n",
       "0           3809          17078  \n",
       "1           4661          25022  \n",
       "2          25974          44614  \n",
       "3             90            114  \n",
       "4            959          13729  "
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train.head()\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "id": "y6SFawOVXFDX"
   },
   "outputs": [],
   "source": [
    "#aggregate test set\n",
    "agg_test = aggregate_by_client_id(invoice_test)\n",
    "test = pd.merge(client_test,agg_test, on='client_id', how='left')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "dg0832jcXzkI",
    "outputId": "2c6ff557-0258-4310-d8f6-aad0627c9ddb"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "((135493, 35), (58069, 34))"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train.shape, test.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>disrict</th>\n",
       "      <th>client_id</th>\n",
       "      <th>client_catg</th>\n",
       "      <th>region</th>\n",
       "      <th>target</th>\n",
       "      <th>creation_year</th>\n",
       "      <th>creation_month</th>\n",
       "      <th>creation_day</th>\n",
       "      <th>1transactions_count</th>\n",
       "      <th>consommation_level_1_mean</th>\n",
       "      <th>consommation_level_1_sum</th>\n",
       "      <th>consommation_level_1_min</th>\n",
       "      <th>consommation_level_1_max</th>\n",
       "      <th>consommation_level_2_mean</th>\n",
       "      <th>consommation_level_2_sum</th>\n",
       "      <th>consommation_level_2_min</th>\n",
       "      <th>consommation_level_2_max</th>\n",
       "      <th>consommation_level_3_mean</th>\n",
       "      <th>consommation_level_3_sum</th>\n",
       "      <th>consommation_level_3_min</th>\n",
       "      <th>consommation_level_3_max</th>\n",
       "      <th>consommation_level_4_mean</th>\n",
       "      <th>consommation_level_4_sum</th>\n",
       "      <th>consommation_level_4_min</th>\n",
       "      <th>consommation_level_4_max</th>\n",
       "      <th>reading_remarque_mean</th>\n",
       "      <th>reading_remarque_min</th>\n",
       "      <th>reading_remarque_max</th>\n",
       "      <th>counter_code_mean</th>\n",
       "      <th>counter_code_min</th>\n",
       "      <th>counter_code_max</th>\n",
       "      <th>old_index_min</th>\n",
       "      <th>old_index_max</th>\n",
       "      <th>new_index_min</th>\n",
       "      <th>new_index_max</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>60</td>\n",
       "      <td>train_Client_0</td>\n",
       "      <td>11</td>\n",
       "      <td>101</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1994</td>\n",
       "      <td>12</td>\n",
       "      <td>31</td>\n",
       "      <td>35</td>\n",
       "      <td>352.400000</td>\n",
       "      <td>12334</td>\n",
       "      <td>38</td>\n",
       "      <td>1200</td>\n",
       "      <td>10.571429</td>\n",
       "      <td>370</td>\n",
       "      <td>0</td>\n",
       "      <td>186</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>6.971429</td>\n",
       "      <td>6</td>\n",
       "      <td>9</td>\n",
       "      <td>203.685714</td>\n",
       "      <td>203</td>\n",
       "      <td>207</td>\n",
       "      <td>3685</td>\n",
       "      <td>16493</td>\n",
       "      <td>3809</td>\n",
       "      <td>17078</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>69</td>\n",
       "      <td>train_Client_1</td>\n",
       "      <td>11</td>\n",
       "      <td>107</td>\n",
       "      <td>0.0</td>\n",
       "      <td>2002</td>\n",
       "      <td>5</td>\n",
       "      <td>29</td>\n",
       "      <td>37</td>\n",
       "      <td>557.540541</td>\n",
       "      <td>20629</td>\n",
       "      <td>190</td>\n",
       "      <td>1207</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>7.216216</td>\n",
       "      <td>6</td>\n",
       "      <td>9</td>\n",
       "      <td>203.000000</td>\n",
       "      <td>203</td>\n",
       "      <td>203</td>\n",
       "      <td>4110</td>\n",
       "      <td>23940</td>\n",
       "      <td>4661</td>\n",
       "      <td>25022</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>62</td>\n",
       "      <td>train_Client_10</td>\n",
       "      <td>11</td>\n",
       "      <td>301</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1986</td>\n",
       "      <td>3</td>\n",
       "      <td>13</td>\n",
       "      <td>18</td>\n",
       "      <td>798.611111</td>\n",
       "      <td>14375</td>\n",
       "      <td>188</td>\n",
       "      <td>2400</td>\n",
       "      <td>37.888889</td>\n",
       "      <td>682</td>\n",
       "      <td>0</td>\n",
       "      <td>682</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>7.055556</td>\n",
       "      <td>6</td>\n",
       "      <td>9</td>\n",
       "      <td>203.222222</td>\n",
       "      <td>203</td>\n",
       "      <td>207</td>\n",
       "      <td>25515</td>\n",
       "      <td>41532</td>\n",
       "      <td>25974</td>\n",
       "      <td>44614</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>69</td>\n",
       "      <td>train_Client_100</td>\n",
       "      <td>11</td>\n",
       "      <td>105</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1996</td>\n",
       "      <td>7</td>\n",
       "      <td>11</td>\n",
       "      <td>20</td>\n",
       "      <td>1.200000</td>\n",
       "      <td>24</td>\n",
       "      <td>0</td>\n",
       "      <td>15</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>6.150000</td>\n",
       "      <td>6</td>\n",
       "      <td>9</td>\n",
       "      <td>413.000000</td>\n",
       "      <td>413</td>\n",
       "      <td>413</td>\n",
       "      <td>90</td>\n",
       "      <td>99</td>\n",
       "      <td>90</td>\n",
       "      <td>114</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>62</td>\n",
       "      <td>train_Client_1000</td>\n",
       "      <td>11</td>\n",
       "      <td>303</td>\n",
       "      <td>0.0</td>\n",
       "      <td>2014</td>\n",
       "      <td>10</td>\n",
       "      <td>14</td>\n",
       "      <td>14</td>\n",
       "      <td>663.714286</td>\n",
       "      <td>9292</td>\n",
       "      <td>124</td>\n",
       "      <td>800</td>\n",
       "      <td>104.857143</td>\n",
       "      <td>1468</td>\n",
       "      <td>0</td>\n",
       "      <td>400</td>\n",
       "      <td>117.357143</td>\n",
       "      <td>1643</td>\n",
       "      <td>0</td>\n",
       "      <td>800</td>\n",
       "      <td>36.714286</td>\n",
       "      <td>514</td>\n",
       "      <td>0</td>\n",
       "      <td>382</td>\n",
       "      <td>8.857143</td>\n",
       "      <td>8</td>\n",
       "      <td>9</td>\n",
       "      <td>207.000000</td>\n",
       "      <td>207</td>\n",
       "      <td>207</td>\n",
       "      <td>0</td>\n",
       "      <td>13337</td>\n",
       "      <td>959</td>\n",
       "      <td>13729</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   disrict          client_id  client_catg  region  target  creation_year  \\\n",
       "0       60     train_Client_0           11     101     0.0           1994   \n",
       "1       69     train_Client_1           11     107     0.0           2002   \n",
       "2       62    train_Client_10           11     301     0.0           1986   \n",
       "3       69   train_Client_100           11     105     0.0           1996   \n",
       "4       62  train_Client_1000           11     303     0.0           2014   \n",
       "\n",
       "   creation_month  creation_day  1transactions_count  \\\n",
       "0              12            31                   35   \n",
       "1               5            29                   37   \n",
       "2               3            13                   18   \n",
       "3               7            11                   20   \n",
       "4              10            14                   14   \n",
       "\n",
       "   consommation_level_1_mean  consommation_level_1_sum  \\\n",
       "0                 352.400000                     12334   \n",
       "1                 557.540541                     20629   \n",
       "2                 798.611111                     14375   \n",
       "3                   1.200000                        24   \n",
       "4                 663.714286                      9292   \n",
       "\n",
       "   consommation_level_1_min  consommation_level_1_max  \\\n",
       "0                        38                      1200   \n",
       "1                       190                      1207   \n",
       "2                       188                      2400   \n",
       "3                         0                        15   \n",
       "4                       124                       800   \n",
       "\n",
       "   consommation_level_2_mean  consommation_level_2_sum  \\\n",
       "0                  10.571429                       370   \n",
       "1                   0.000000                         0   \n",
       "2                  37.888889                       682   \n",
       "3                   0.000000                         0   \n",
       "4                 104.857143                      1468   \n",
       "\n",
       "   consommation_level_2_min  consommation_level_2_max  \\\n",
       "0                         0                       186   \n",
       "1                         0                         0   \n",
       "2                         0                       682   \n",
       "3                         0                         0   \n",
       "4                         0                       400   \n",
       "\n",
       "   consommation_level_3_mean  consommation_level_3_sum  \\\n",
       "0                   0.000000                         0   \n",
       "1                   0.000000                         0   \n",
       "2                   0.000000                         0   \n",
       "3                   0.000000                         0   \n",
       "4                 117.357143                      1643   \n",
       "\n",
       "   consommation_level_3_min  consommation_level_3_max  \\\n",
       "0                         0                         0   \n",
       "1                         0                         0   \n",
       "2                         0                         0   \n",
       "3                         0                         0   \n",
       "4                         0                       800   \n",
       "\n",
       "   consommation_level_4_mean  consommation_level_4_sum  \\\n",
       "0                   0.000000                         0   \n",
       "1                   0.000000                         0   \n",
       "2                   0.000000                         0   \n",
       "3                   0.000000                         0   \n",
       "4                  36.714286                       514   \n",
       "\n",
       "   consommation_level_4_min  consommation_level_4_max  reading_remarque_mean  \\\n",
       "0                         0                         0               6.971429   \n",
       "1                         0                         0               7.216216   \n",
       "2                         0                         0               7.055556   \n",
       "3                         0                         0               6.150000   \n",
       "4                         0                       382               8.857143   \n",
       "\n",
       "   reading_remarque_min  reading_remarque_max  counter_code_mean  \\\n",
       "0                     6                     9         203.685714   \n",
       "1                     6                     9         203.000000   \n",
       "2                     6                     9         203.222222   \n",
       "3                     6                     9         413.000000   \n",
       "4                     8                     9         207.000000   \n",
       "\n",
       "   counter_code_min  counter_code_max  old_index_min  old_index_max  \\\n",
       "0               203               207           3685          16493   \n",
       "1               203               203           4110          23940   \n",
       "2               203               207          25515          41532   \n",
       "3               413               413             90             99   \n",
       "4               207               207              0          13337   \n",
       "\n",
       "   new_index_min  new_index_max  \n",
       "0           3809          17078  \n",
       "1           4661          25022  \n",
       "2          25974          44614  \n",
       "3             90            114  \n",
       "4            959          13729  "
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [],
   "source": [
    "def calculate_diff_months(df, date_column='invoice_date', id_column='client_id'):\n",
    "    # Convert the date string column to pandas datetime objects\n",
    "    df[date_column] = pd.to_datetime(df[date_column])\n",
    "\n",
    "    # Group by 'id_column' and calculate the difference between the maximum and minimum dates for each group\n",
    "    date_diff_per_id = df.groupby(id_column)[date_column].agg(lambda x: x.max() - x.min())\n",
    "\n",
    "    # Convert the difference to the number of months\n",
    "    date_diff_per_id = date_diff_per_id.apply(lambda x: x.days)  # Convert to numeric days difference\n",
    "    date_diff_per_id = date_diff_per_id / 30  # Divide by 30 to get the number of months\n",
    "\n",
    "    # Add the new column \"diff_months\" to the DataFrame\n",
    "    df[\"diff_months\"] = df[id_column].map(date_diff_per_id)\n",
    "\n",
    "    return df\n",
    "invoice_train = calculate_diff_months(invoice_train, date_column='invoice_date', id_column='client_id')\n",
    "\n",
    "columns_to_drop=['invoice_date','counter_statue','reading_remarque','counter_code','counter_coefficient','old_index','new_index']\n",
    "invoice_train.drop(columns=columns_to_drop, inplace=True)\n",
    "#train['index_difference'] = train['new_index_mean'] - train['old_index_mean']\n",
    "invoice_train = invoice_train.drop_duplicates(subset='client_id', keep='first')\n",
    "train = pd.merge(train,invoice_train, on='client_id', how='left')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>disrict</th>\n",
       "      <th>client_id</th>\n",
       "      <th>client_catg</th>\n",
       "      <th>region</th>\n",
       "      <th>target</th>\n",
       "      <th>creation_year</th>\n",
       "      <th>creation_month</th>\n",
       "      <th>creation_day</th>\n",
       "      <th>1transactions_count</th>\n",
       "      <th>consommation_level_1_mean</th>\n",
       "      <th>consommation_level_1_sum</th>\n",
       "      <th>consommation_level_1_min</th>\n",
       "      <th>consommation_level_1_max</th>\n",
       "      <th>consommation_level_2_mean</th>\n",
       "      <th>consommation_level_2_sum</th>\n",
       "      <th>consommation_level_2_min</th>\n",
       "      <th>consommation_level_2_max</th>\n",
       "      <th>consommation_level_3_mean</th>\n",
       "      <th>consommation_level_3_sum</th>\n",
       "      <th>consommation_level_3_min</th>\n",
       "      <th>consommation_level_3_max</th>\n",
       "      <th>consommation_level_4_mean</th>\n",
       "      <th>consommation_level_4_sum</th>\n",
       "      <th>consommation_level_4_min</th>\n",
       "      <th>consommation_level_4_max</th>\n",
       "      <th>reading_remarque_mean</th>\n",
       "      <th>reading_remarque_min</th>\n",
       "      <th>reading_remarque_max</th>\n",
       "      <th>counter_code_mean</th>\n",
       "      <th>counter_code_min</th>\n",
       "      <th>counter_code_max</th>\n",
       "      <th>old_index_min</th>\n",
       "      <th>old_index_max</th>\n",
       "      <th>new_index_min</th>\n",
       "      <th>new_index_max</th>\n",
       "      <th>tarif_type</th>\n",
       "      <th>counter_number</th>\n",
       "      <th>consommation_level_1</th>\n",
       "      <th>consommation_level_2</th>\n",
       "      <th>consommation_level_3</th>\n",
       "      <th>consommation_level_4</th>\n",
       "      <th>months_number</th>\n",
       "      <th>counter_type</th>\n",
       "      <th>diff_months</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>60</td>\n",
       "      <td>train_Client_0</td>\n",
       "      <td>11</td>\n",
       "      <td>101</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1994</td>\n",
       "      <td>12</td>\n",
       "      <td>31</td>\n",
       "      <td>35</td>\n",
       "      <td>352.400000</td>\n",
       "      <td>12334</td>\n",
       "      <td>38</td>\n",
       "      <td>1200</td>\n",
       "      <td>10.571429</td>\n",
       "      <td>370</td>\n",
       "      <td>0</td>\n",
       "      <td>186</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>6.971429</td>\n",
       "      <td>6</td>\n",
       "      <td>9</td>\n",
       "      <td>203.685714</td>\n",
       "      <td>203</td>\n",
       "      <td>207</td>\n",
       "      <td>3685</td>\n",
       "      <td>16493</td>\n",
       "      <td>3809</td>\n",
       "      <td>17078</td>\n",
       "      <td>11</td>\n",
       "      <td>1335667</td>\n",
       "      <td>82</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "      <td>163.366667</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>69</td>\n",
       "      <td>train_Client_1</td>\n",
       "      <td>11</td>\n",
       "      <td>107</td>\n",
       "      <td>0.0</td>\n",
       "      <td>2002</td>\n",
       "      <td>5</td>\n",
       "      <td>29</td>\n",
       "      <td>37</td>\n",
       "      <td>557.540541</td>\n",
       "      <td>20629</td>\n",
       "      <td>190</td>\n",
       "      <td>1207</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>7.216216</td>\n",
       "      <td>6</td>\n",
       "      <td>9</td>\n",
       "      <td>203.000000</td>\n",
       "      <td>203</td>\n",
       "      <td>203</td>\n",
       "      <td>4110</td>\n",
       "      <td>23940</td>\n",
       "      <td>4661</td>\n",
       "      <td>25022</td>\n",
       "      <td>11</td>\n",
       "      <td>678902</td>\n",
       "      <td>388</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>163.766667</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>62</td>\n",
       "      <td>train_Client_10</td>\n",
       "      <td>11</td>\n",
       "      <td>301</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1986</td>\n",
       "      <td>3</td>\n",
       "      <td>13</td>\n",
       "      <td>18</td>\n",
       "      <td>798.611111</td>\n",
       "      <td>14375</td>\n",
       "      <td>188</td>\n",
       "      <td>2400</td>\n",
       "      <td>37.888889</td>\n",
       "      <td>682</td>\n",
       "      <td>0</td>\n",
       "      <td>682</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>7.055556</td>\n",
       "      <td>6</td>\n",
       "      <td>9</td>\n",
       "      <td>203.222222</td>\n",
       "      <td>203</td>\n",
       "      <td>207</td>\n",
       "      <td>25515</td>\n",
       "      <td>41532</td>\n",
       "      <td>25974</td>\n",
       "      <td>44614</td>\n",
       "      <td>11</td>\n",
       "      <td>572765</td>\n",
       "      <td>407</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "      <td>164.033333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>69</td>\n",
       "      <td>train_Client_100</td>\n",
       "      <td>11</td>\n",
       "      <td>105</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1996</td>\n",
       "      <td>7</td>\n",
       "      <td>11</td>\n",
       "      <td>20</td>\n",
       "      <td>1.200000</td>\n",
       "      <td>24</td>\n",
       "      <td>0</td>\n",
       "      <td>15</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>6.150000</td>\n",
       "      <td>6</td>\n",
       "      <td>9</td>\n",
       "      <td>413.000000</td>\n",
       "      <td>413</td>\n",
       "      <td>413</td>\n",
       "      <td>90</td>\n",
       "      <td>99</td>\n",
       "      <td>90</td>\n",
       "      <td>114</td>\n",
       "      <td>11</td>\n",
       "      <td>2078</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "      <td>88.800000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>62</td>\n",
       "      <td>train_Client_1000</td>\n",
       "      <td>11</td>\n",
       "      <td>303</td>\n",
       "      <td>0.0</td>\n",
       "      <td>2014</td>\n",
       "      <td>10</td>\n",
       "      <td>14</td>\n",
       "      <td>14</td>\n",
       "      <td>663.714286</td>\n",
       "      <td>9292</td>\n",
       "      <td>124</td>\n",
       "      <td>800</td>\n",
       "      <td>104.857143</td>\n",
       "      <td>1468</td>\n",
       "      <td>0</td>\n",
       "      <td>400</td>\n",
       "      <td>117.357143</td>\n",
       "      <td>1643</td>\n",
       "      <td>0</td>\n",
       "      <td>800</td>\n",
       "      <td>36.714286</td>\n",
       "      <td>514</td>\n",
       "      <td>0</td>\n",
       "      <td>382</td>\n",
       "      <td>8.857143</td>\n",
       "      <td>8</td>\n",
       "      <td>9</td>\n",
       "      <td>207.000000</td>\n",
       "      <td>207</td>\n",
       "      <td>207</td>\n",
       "      <td>0</td>\n",
       "      <td>13337</td>\n",
       "      <td>959</td>\n",
       "      <td>13729</td>\n",
       "      <td>11</td>\n",
       "      <td>19575</td>\n",
       "      <td>800</td>\n",
       "      <td>159</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "      <td>52.833333</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   disrict          client_id  client_catg  region  target  creation_year  \\\n",
       "0       60     train_Client_0           11     101     0.0           1994   \n",
       "1       69     train_Client_1           11     107     0.0           2002   \n",
       "2       62    train_Client_10           11     301     0.0           1986   \n",
       "3       69   train_Client_100           11     105     0.0           1996   \n",
       "4       62  train_Client_1000           11     303     0.0           2014   \n",
       "\n",
       "   creation_month  creation_day  1transactions_count  \\\n",
       "0              12            31                   35   \n",
       "1               5            29                   37   \n",
       "2               3            13                   18   \n",
       "3               7            11                   20   \n",
       "4              10            14                   14   \n",
       "\n",
       "   consommation_level_1_mean  consommation_level_1_sum  \\\n",
       "0                 352.400000                     12334   \n",
       "1                 557.540541                     20629   \n",
       "2                 798.611111                     14375   \n",
       "3                   1.200000                        24   \n",
       "4                 663.714286                      9292   \n",
       "\n",
       "   consommation_level_1_min  consommation_level_1_max  \\\n",
       "0                        38                      1200   \n",
       "1                       190                      1207   \n",
       "2                       188                      2400   \n",
       "3                         0                        15   \n",
       "4                       124                       800   \n",
       "\n",
       "   consommation_level_2_mean  consommation_level_2_sum  \\\n",
       "0                  10.571429                       370   \n",
       "1                   0.000000                         0   \n",
       "2                  37.888889                       682   \n",
       "3                   0.000000                         0   \n",
       "4                 104.857143                      1468   \n",
       "\n",
       "   consommation_level_2_min  consommation_level_2_max  \\\n",
       "0                         0                       186   \n",
       "1                         0                         0   \n",
       "2                         0                       682   \n",
       "3                         0                         0   \n",
       "4                         0                       400   \n",
       "\n",
       "   consommation_level_3_mean  consommation_level_3_sum  \\\n",
       "0                   0.000000                         0   \n",
       "1                   0.000000                         0   \n",
       "2                   0.000000                         0   \n",
       "3                   0.000000                         0   \n",
       "4                 117.357143                      1643   \n",
       "\n",
       "   consommation_level_3_min  consommation_level_3_max  \\\n",
       "0                         0                         0   \n",
       "1                         0                         0   \n",
       "2                         0                         0   \n",
       "3                         0                         0   \n",
       "4                         0                       800   \n",
       "\n",
       "   consommation_level_4_mean  consommation_level_4_sum  \\\n",
       "0                   0.000000                         0   \n",
       "1                   0.000000                         0   \n",
       "2                   0.000000                         0   \n",
       "3                   0.000000                         0   \n",
       "4                  36.714286                       514   \n",
       "\n",
       "   consommation_level_4_min  consommation_level_4_max  reading_remarque_mean  \\\n",
       "0                         0                         0               6.971429   \n",
       "1                         0                         0               7.216216   \n",
       "2                         0                         0               7.055556   \n",
       "3                         0                         0               6.150000   \n",
       "4                         0                       382               8.857143   \n",
       "\n",
       "   reading_remarque_min  reading_remarque_max  counter_code_mean  \\\n",
       "0                     6                     9         203.685714   \n",
       "1                     6                     9         203.000000   \n",
       "2                     6                     9         203.222222   \n",
       "3                     6                     9         413.000000   \n",
       "4                     8                     9         207.000000   \n",
       "\n",
       "   counter_code_min  counter_code_max  old_index_min  old_index_max  \\\n",
       "0               203               207           3685          16493   \n",
       "1               203               203           4110          23940   \n",
       "2               203               207          25515          41532   \n",
       "3               413               413             90             99   \n",
       "4               207               207              0          13337   \n",
       "\n",
       "   new_index_min  new_index_max  tarif_type  counter_number  \\\n",
       "0           3809          17078          11         1335667   \n",
       "1           4661          25022          11          678902   \n",
       "2          25974          44614          11          572765   \n",
       "3             90            114          11            2078   \n",
       "4            959          13729          11           19575   \n",
       "\n",
       "   consommation_level_1  consommation_level_2  consommation_level_3  \\\n",
       "0                    82                     0                     0   \n",
       "1                   388                     0                     0   \n",
       "2                   407                     0                     0   \n",
       "3                     0                     0                     0   \n",
       "4                   800                   159                     0   \n",
       "\n",
       "   consommation_level_4  months_number  counter_type  diff_months  \n",
       "0                     0              4             0   163.366667  \n",
       "1                     0              2             0   163.766667  \n",
       "2                     0              4             0   164.033333  \n",
       "3                     0              4             0    88.800000  \n",
       "4                     0              4             0    52.833333  "
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>disrict</th>\n",
       "      <th>client_id</th>\n",
       "      <th>client_catg</th>\n",
       "      <th>region</th>\n",
       "      <th>creation_year</th>\n",
       "      <th>creation_month</th>\n",
       "      <th>creation_day</th>\n",
       "      <th>1transactions_count</th>\n",
       "      <th>consommation_level_1_mean</th>\n",
       "      <th>consommation_level_1_sum</th>\n",
       "      <th>consommation_level_1_min</th>\n",
       "      <th>consommation_level_1_max</th>\n",
       "      <th>consommation_level_2_mean</th>\n",
       "      <th>consommation_level_2_sum</th>\n",
       "      <th>consommation_level_2_min</th>\n",
       "      <th>consommation_level_2_max</th>\n",
       "      <th>consommation_level_3_mean</th>\n",
       "      <th>consommation_level_3_sum</th>\n",
       "      <th>consommation_level_3_min</th>\n",
       "      <th>consommation_level_3_max</th>\n",
       "      <th>consommation_level_4_mean</th>\n",
       "      <th>consommation_level_4_sum</th>\n",
       "      <th>consommation_level_4_min</th>\n",
       "      <th>consommation_level_4_max</th>\n",
       "      <th>reading_remarque_mean</th>\n",
       "      <th>reading_remarque_min</th>\n",
       "      <th>reading_remarque_max</th>\n",
       "      <th>counter_code_mean</th>\n",
       "      <th>counter_code_min</th>\n",
       "      <th>counter_code_max</th>\n",
       "      <th>old_index_min</th>\n",
       "      <th>old_index_max</th>\n",
       "      <th>new_index_min</th>\n",
       "      <th>new_index_max</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>62</td>\n",
       "      <td>test_Client_0</td>\n",
       "      <td>11</td>\n",
       "      <td>307</td>\n",
       "      <td>2002</td>\n",
       "      <td>5</td>\n",
       "      <td>28</td>\n",
       "      <td>37</td>\n",
       "      <td>488.135135</td>\n",
       "      <td>18061</td>\n",
       "      <td>0</td>\n",
       "      <td>1090</td>\n",
       "      <td>3.243243</td>\n",
       "      <td>120</td>\n",
       "      <td>0</td>\n",
       "      <td>120</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>6.810811</td>\n",
       "      <td>6</td>\n",
       "      <td>9</td>\n",
       "      <td>203.000000</td>\n",
       "      <td>203</td>\n",
       "      <td>203</td>\n",
       "      <td>3057</td>\n",
       "      <td>21348</td>\n",
       "      <td>3383</td>\n",
       "      <td>21677</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>69</td>\n",
       "      <td>test_Client_1</td>\n",
       "      <td>11</td>\n",
       "      <td>103</td>\n",
       "      <td>2009</td>\n",
       "      <td>8</td>\n",
       "      <td>6</td>\n",
       "      <td>22</td>\n",
       "      <td>1091.409091</td>\n",
       "      <td>24011</td>\n",
       "      <td>11</td>\n",
       "      <td>3600</td>\n",
       "      <td>843.136364</td>\n",
       "      <td>18549</td>\n",
       "      <td>0</td>\n",
       "      <td>4053</td>\n",
       "      <td>182.318182</td>\n",
       "      <td>4011</td>\n",
       "      <td>0</td>\n",
       "      <td>1144</td>\n",
       "      <td>586.318182</td>\n",
       "      <td>12899</td>\n",
       "      <td>0</td>\n",
       "      <td>12899</td>\n",
       "      <td>7.636364</td>\n",
       "      <td>6</td>\n",
       "      <td>9</td>\n",
       "      <td>433.000000</td>\n",
       "      <td>433</td>\n",
       "      <td>433</td>\n",
       "      <td>7</td>\n",
       "      <td>55213</td>\n",
       "      <td>800</td>\n",
       "      <td>57083</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>62</td>\n",
       "      <td>test_Client_10</td>\n",
       "      <td>11</td>\n",
       "      <td>310</td>\n",
       "      <td>2004</td>\n",
       "      <td>4</td>\n",
       "      <td>7</td>\n",
       "      <td>74</td>\n",
       "      <td>554.040541</td>\n",
       "      <td>40999</td>\n",
       "      <td>0</td>\n",
       "      <td>1200</td>\n",
       "      <td>37.364865</td>\n",
       "      <td>2765</td>\n",
       "      <td>0</td>\n",
       "      <td>400</td>\n",
       "      <td>15.743243</td>\n",
       "      <td>1165</td>\n",
       "      <td>0</td>\n",
       "      <td>800</td>\n",
       "      <td>0.162162</td>\n",
       "      <td>12</td>\n",
       "      <td>0</td>\n",
       "      <td>12</td>\n",
       "      <td>7.459459</td>\n",
       "      <td>6</td>\n",
       "      <td>9</td>\n",
       "      <td>119.648649</td>\n",
       "      <td>5</td>\n",
       "      <td>207</td>\n",
       "      <td>0</td>\n",
       "      <td>37461</td>\n",
       "      <td>41</td>\n",
       "      <td>39026</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>60</td>\n",
       "      <td>test_Client_100</td>\n",
       "      <td>11</td>\n",
       "      <td>101</td>\n",
       "      <td>1992</td>\n",
       "      <td>10</td>\n",
       "      <td>8</td>\n",
       "      <td>40</td>\n",
       "      <td>244.350000</td>\n",
       "      <td>9774</td>\n",
       "      <td>0</td>\n",
       "      <td>721</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>6.575000</td>\n",
       "      <td>6</td>\n",
       "      <td>9</td>\n",
       "      <td>104.000000</td>\n",
       "      <td>5</td>\n",
       "      <td>203</td>\n",
       "      <td>148</td>\n",
       "      <td>31354</td>\n",
       "      <td>240</td>\n",
       "      <td>31525</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>62</td>\n",
       "      <td>test_Client_1000</td>\n",
       "      <td>11</td>\n",
       "      <td>301</td>\n",
       "      <td>1977</td>\n",
       "      <td>7</td>\n",
       "      <td>21</td>\n",
       "      <td>53</td>\n",
       "      <td>568.188679</td>\n",
       "      <td>30114</td>\n",
       "      <td>0</td>\n",
       "      <td>2400</td>\n",
       "      <td>145.056604</td>\n",
       "      <td>7688</td>\n",
       "      <td>0</td>\n",
       "      <td>1362</td>\n",
       "      <td>33.679245</td>\n",
       "      <td>1785</td>\n",
       "      <td>0</td>\n",
       "      <td>1340</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>7.905660</td>\n",
       "      <td>6</td>\n",
       "      <td>9</td>\n",
       "      <td>135.754717</td>\n",
       "      <td>5</td>\n",
       "      <td>203</td>\n",
       "      <td>0</td>\n",
       "      <td>41148</td>\n",
       "      <td>116</td>\n",
       "      <td>41215</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   disrict         client_id  client_catg  region  creation_year  \\\n",
       "0       62     test_Client_0           11     307           2002   \n",
       "1       69     test_Client_1           11     103           2009   \n",
       "2       62    test_Client_10           11     310           2004   \n",
       "3       60   test_Client_100           11     101           1992   \n",
       "4       62  test_Client_1000           11     301           1977   \n",
       "\n",
       "   creation_month  creation_day  1transactions_count  \\\n",
       "0               5            28                   37   \n",
       "1               8             6                   22   \n",
       "2               4             7                   74   \n",
       "3              10             8                   40   \n",
       "4               7            21                   53   \n",
       "\n",
       "   consommation_level_1_mean  consommation_level_1_sum  \\\n",
       "0                 488.135135                     18061   \n",
       "1                1091.409091                     24011   \n",
       "2                 554.040541                     40999   \n",
       "3                 244.350000                      9774   \n",
       "4                 568.188679                     30114   \n",
       "\n",
       "   consommation_level_1_min  consommation_level_1_max  \\\n",
       "0                         0                      1090   \n",
       "1                        11                      3600   \n",
       "2                         0                      1200   \n",
       "3                         0                       721   \n",
       "4                         0                      2400   \n",
       "\n",
       "   consommation_level_2_mean  consommation_level_2_sum  \\\n",
       "0                   3.243243                       120   \n",
       "1                 843.136364                     18549   \n",
       "2                  37.364865                      2765   \n",
       "3                   0.000000                         0   \n",
       "4                 145.056604                      7688   \n",
       "\n",
       "   consommation_level_2_min  consommation_level_2_max  \\\n",
       "0                         0                       120   \n",
       "1                         0                      4053   \n",
       "2                         0                       400   \n",
       "3                         0                         0   \n",
       "4                         0                      1362   \n",
       "\n",
       "   consommation_level_3_mean  consommation_level_3_sum  \\\n",
       "0                   0.000000                         0   \n",
       "1                 182.318182                      4011   \n",
       "2                  15.743243                      1165   \n",
       "3                   0.000000                         0   \n",
       "4                  33.679245                      1785   \n",
       "\n",
       "   consommation_level_3_min  consommation_level_3_max  \\\n",
       "0                         0                         0   \n",
       "1                         0                      1144   \n",
       "2                         0                       800   \n",
       "3                         0                         0   \n",
       "4                         0                      1340   \n",
       "\n",
       "   consommation_level_4_mean  consommation_level_4_sum  \\\n",
       "0                   0.000000                         0   \n",
       "1                 586.318182                     12899   \n",
       "2                   0.162162                        12   \n",
       "3                   0.000000                         0   \n",
       "4                   0.000000                         0   \n",
       "\n",
       "   consommation_level_4_min  consommation_level_4_max  reading_remarque_mean  \\\n",
       "0                         0                         0               6.810811   \n",
       "1                         0                     12899               7.636364   \n",
       "2                         0                        12               7.459459   \n",
       "3                         0                         0               6.575000   \n",
       "4                         0                         0               7.905660   \n",
       "\n",
       "   reading_remarque_min  reading_remarque_max  counter_code_mean  \\\n",
       "0                     6                     9         203.000000   \n",
       "1                     6                     9         433.000000   \n",
       "2                     6                     9         119.648649   \n",
       "3                     6                     9         104.000000   \n",
       "4                     6                     9         135.754717   \n",
       "\n",
       "   counter_code_min  counter_code_max  old_index_min  old_index_max  \\\n",
       "0               203               203           3057          21348   \n",
       "1               433               433              7          55213   \n",
       "2                 5               207              0          37461   \n",
       "3                 5               203            148          31354   \n",
       "4                 5               203              0          41148   \n",
       "\n",
       "   new_index_min  new_index_max  \n",
       "0           3383          21677  \n",
       "1            800          57083  \n",
       "2             41          39026  \n",
       "3            240          31525  \n",
       "4            116          41215  "
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "test.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "# Assuming you have the functions and DataFrame 'invoice_train' defined previously\n",
    "\n",
    "# Step 1: Calculate the diff_months and drop unnecessary columns in 'invoice_train'\n",
    "invoice_test = calculate_diff_months(invoice_test, date_column='invoice_date', id_column='client_id')\n",
    "\n",
    "columns_to_drop = ['invoice_date', 'counter_statue', 'reading_remarque', 'counter_code', 'counter_coefficient',\n",
    "                   'old_index', 'new_index',]\n",
    "invoice_test.drop(columns=columns_to_drop, inplace=True)\n",
    "\n",
    "#test['index_difference'] = test['new_index_mean'] - test['old_index_mean']\n",
    "\n",
    "# Step 3: Drop duplicates in 'invoice_train' based on 'client_id'\n",
    "invoice_test = invoice_test.drop_duplicates(subset='client_id', keep='first')\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>client_id</th>\n",
       "      <th>tarif_type</th>\n",
       "      <th>counter_number</th>\n",
       "      <th>consommation_level_1</th>\n",
       "      <th>consommation_level_2</th>\n",
       "      <th>consommation_level_3</th>\n",
       "      <th>consommation_level_4</th>\n",
       "      <th>months_number</th>\n",
       "      <th>counter_type</th>\n",
       "      <th>diff_months</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>test_Client_0</td>\n",
       "      <td>11</td>\n",
       "      <td>651208</td>\n",
       "      <td>755</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>8</td>\n",
       "      <td>0</td>\n",
       "      <td>165.566667</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>37</th>\n",
       "      <td>test_Client_1</td>\n",
       "      <td>11</td>\n",
       "      <td>174760</td>\n",
       "      <td>800</td>\n",
       "      <td>400</td>\n",
       "      <td>180</td>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "      <td>124.800000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>59</th>\n",
       "      <td>test_Client_10</td>\n",
       "      <td>11</td>\n",
       "      <td>799814</td>\n",
       "      <td>800</td>\n",
       "      <td>84</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "      <td>167.400000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>133</th>\n",
       "      <td>test_Client_100</td>\n",
       "      <td>11</td>\n",
       "      <td>1064358</td>\n",
       "      <td>562</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "      <td>82.866667</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>173</th>\n",
       "      <td>test_Client_1000</td>\n",
       "      <td>11</td>\n",
       "      <td>8524250</td>\n",
       "      <td>860</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "      <td>171.233333</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            client_id  tarif_type  counter_number  consommation_level_1  \\\n",
       "0       test_Client_0          11          651208                   755   \n",
       "37      test_Client_1          11          174760                   800   \n",
       "59     test_Client_10          11          799814                   800   \n",
       "133   test_Client_100          11         1064358                   562   \n",
       "173  test_Client_1000          11         8524250                   860   \n",
       "\n",
       "     consommation_level_2  consommation_level_3  consommation_level_4  \\\n",
       "0                       0                     0                     0   \n",
       "37                    400                   180                     0   \n",
       "59                     84                     0                     0   \n",
       "133                     0                     0                     0   \n",
       "173                     0                     0                     0   \n",
       "\n",
       "     months_number  counter_type  diff_months  \n",
       "0                8             0   165.566667  \n",
       "37               4             0   124.800000  \n",
       "59               4             0   167.400000  \n",
       "133              4             0    82.866667  \n",
       "173              4             0   171.233333  "
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "invoice_test.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "# Step 4: Merge 'invoice_train' with 'train' on 'client_id'\n",
    "test = pd.merge(test, invoice_test, on='client_id', how='left')\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [],
   "source": [
    "train['totale_consommation']=train['consommation_level_1']+train['consommation_level_2']+train['consommation_level_3']+train['consommation_level_4']\n",
    "test['totale_consommation']=test['consommation_level_1']+test['consommation_level_2']+test['consommation_level_3']+test['consommation_level_4']\n",
    "train = pd.get_dummies(train, columns=['disrict'])\n",
    "test = pd.get_dummies(test, columns=['disrict'])\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(135493, 48)"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(58069, 47)"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "test.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {
    "id": "BwYBcKk8X3mr"
   },
   "outputs": [],
   "source": [
    "#drop redundant columns\n",
    "sub_client_id = test['client_id']\n",
    "drop_columns = ['client_id', 'creation_date']\n",
    "\n",
    "for col in drop_columns:\n",
    "    if col in train.columns:\n",
    "        train.drop([col], axis=1, inplace=True)\n",
    "    if col in test.columns:\n",
    "        test.drop([col], axis=1, inplace=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>client_catg</th>\n",
       "      <th>region</th>\n",
       "      <th>target</th>\n",
       "      <th>creation_year</th>\n",
       "      <th>creation_month</th>\n",
       "      <th>creation_day</th>\n",
       "      <th>1transactions_count</th>\n",
       "      <th>consommation_level_1_mean</th>\n",
       "      <th>consommation_level_1_sum</th>\n",
       "      <th>consommation_level_1_min</th>\n",
       "      <th>consommation_level_1_max</th>\n",
       "      <th>consommation_level_2_mean</th>\n",
       "      <th>consommation_level_2_sum</th>\n",
       "      <th>consommation_level_2_min</th>\n",
       "      <th>consommation_level_2_max</th>\n",
       "      <th>consommation_level_3_mean</th>\n",
       "      <th>consommation_level_3_sum</th>\n",
       "      <th>consommation_level_3_min</th>\n",
       "      <th>consommation_level_3_max</th>\n",
       "      <th>consommation_level_4_mean</th>\n",
       "      <th>consommation_level_4_sum</th>\n",
       "      <th>consommation_level_4_min</th>\n",
       "      <th>consommation_level_4_max</th>\n",
       "      <th>reading_remarque_mean</th>\n",
       "      <th>reading_remarque_min</th>\n",
       "      <th>reading_remarque_max</th>\n",
       "      <th>counter_code_mean</th>\n",
       "      <th>counter_code_min</th>\n",
       "      <th>counter_code_max</th>\n",
       "      <th>old_index_min</th>\n",
       "      <th>old_index_max</th>\n",
       "      <th>new_index_min</th>\n",
       "      <th>new_index_max</th>\n",
       "      <th>tarif_type</th>\n",
       "      <th>counter_number</th>\n",
       "      <th>consommation_level_1</th>\n",
       "      <th>consommation_level_2</th>\n",
       "      <th>consommation_level_3</th>\n",
       "      <th>consommation_level_4</th>\n",
       "      <th>months_number</th>\n",
       "      <th>counter_type</th>\n",
       "      <th>diff_months</th>\n",
       "      <th>totale_consommation</th>\n",
       "      <th>disrict_60</th>\n",
       "      <th>disrict_62</th>\n",
       "      <th>disrict_63</th>\n",
       "      <th>disrict_69</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>11</td>\n",
       "      <td>101</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1994</td>\n",
       "      <td>12</td>\n",
       "      <td>31</td>\n",
       "      <td>35</td>\n",
       "      <td>352.400000</td>\n",
       "      <td>12334</td>\n",
       "      <td>38</td>\n",
       "      <td>1200</td>\n",
       "      <td>10.571429</td>\n",
       "      <td>370</td>\n",
       "      <td>0</td>\n",
       "      <td>186</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>6.971429</td>\n",
       "      <td>6</td>\n",
       "      <td>9</td>\n",
       "      <td>203.685714</td>\n",
       "      <td>203</td>\n",
       "      <td>207</td>\n",
       "      <td>3685</td>\n",
       "      <td>16493</td>\n",
       "      <td>3809</td>\n",
       "      <td>17078</td>\n",
       "      <td>11</td>\n",
       "      <td>1335667</td>\n",
       "      <td>82</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "      <td>163.366667</td>\n",
       "      <td>82</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>11</td>\n",
       "      <td>107</td>\n",
       "      <td>0.0</td>\n",
       "      <td>2002</td>\n",
       "      <td>5</td>\n",
       "      <td>29</td>\n",
       "      <td>37</td>\n",
       "      <td>557.540541</td>\n",
       "      <td>20629</td>\n",
       "      <td>190</td>\n",
       "      <td>1207</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>7.216216</td>\n",
       "      <td>6</td>\n",
       "      <td>9</td>\n",
       "      <td>203.000000</td>\n",
       "      <td>203</td>\n",
       "      <td>203</td>\n",
       "      <td>4110</td>\n",
       "      <td>23940</td>\n",
       "      <td>4661</td>\n",
       "      <td>25022</td>\n",
       "      <td>11</td>\n",
       "      <td>678902</td>\n",
       "      <td>388</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>163.766667</td>\n",
       "      <td>388</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>11</td>\n",
       "      <td>301</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1986</td>\n",
       "      <td>3</td>\n",
       "      <td>13</td>\n",
       "      <td>18</td>\n",
       "      <td>798.611111</td>\n",
       "      <td>14375</td>\n",
       "      <td>188</td>\n",
       "      <td>2400</td>\n",
       "      <td>37.888889</td>\n",
       "      <td>682</td>\n",
       "      <td>0</td>\n",
       "      <td>682</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>7.055556</td>\n",
       "      <td>6</td>\n",
       "      <td>9</td>\n",
       "      <td>203.222222</td>\n",
       "      <td>203</td>\n",
       "      <td>207</td>\n",
       "      <td>25515</td>\n",
       "      <td>41532</td>\n",
       "      <td>25974</td>\n",
       "      <td>44614</td>\n",
       "      <td>11</td>\n",
       "      <td>572765</td>\n",
       "      <td>407</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "      <td>164.033333</td>\n",
       "      <td>407</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>11</td>\n",
       "      <td>105</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1996</td>\n",
       "      <td>7</td>\n",
       "      <td>11</td>\n",
       "      <td>20</td>\n",
       "      <td>1.200000</td>\n",
       "      <td>24</td>\n",
       "      <td>0</td>\n",
       "      <td>15</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>6.150000</td>\n",
       "      <td>6</td>\n",
       "      <td>9</td>\n",
       "      <td>413.000000</td>\n",
       "      <td>413</td>\n",
       "      <td>413</td>\n",
       "      <td>90</td>\n",
       "      <td>99</td>\n",
       "      <td>90</td>\n",
       "      <td>114</td>\n",
       "      <td>11</td>\n",
       "      <td>2078</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "      <td>88.800000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>11</td>\n",
       "      <td>303</td>\n",
       "      <td>0.0</td>\n",
       "      <td>2014</td>\n",
       "      <td>10</td>\n",
       "      <td>14</td>\n",
       "      <td>14</td>\n",
       "      <td>663.714286</td>\n",
       "      <td>9292</td>\n",
       "      <td>124</td>\n",
       "      <td>800</td>\n",
       "      <td>104.857143</td>\n",
       "      <td>1468</td>\n",
       "      <td>0</td>\n",
       "      <td>400</td>\n",
       "      <td>117.357143</td>\n",
       "      <td>1643</td>\n",
       "      <td>0</td>\n",
       "      <td>800</td>\n",
       "      <td>36.714286</td>\n",
       "      <td>514</td>\n",
       "      <td>0</td>\n",
       "      <td>382</td>\n",
       "      <td>8.857143</td>\n",
       "      <td>8</td>\n",
       "      <td>9</td>\n",
       "      <td>207.000000</td>\n",
       "      <td>207</td>\n",
       "      <td>207</td>\n",
       "      <td>0</td>\n",
       "      <td>13337</td>\n",
       "      <td>959</td>\n",
       "      <td>13729</td>\n",
       "      <td>11</td>\n",
       "      <td>19575</td>\n",
       "      <td>800</td>\n",
       "      <td>159</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "      <td>52.833333</td>\n",
       "      <td>959</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   client_catg  region  target  creation_year  creation_month  creation_day  \\\n",
       "0           11     101     0.0           1994              12            31   \n",
       "1           11     107     0.0           2002               5            29   \n",
       "2           11     301     0.0           1986               3            13   \n",
       "3           11     105     0.0           1996               7            11   \n",
       "4           11     303     0.0           2014              10            14   \n",
       "\n",
       "   1transactions_count  consommation_level_1_mean  consommation_level_1_sum  \\\n",
       "0                   35                 352.400000                     12334   \n",
       "1                   37                 557.540541                     20629   \n",
       "2                   18                 798.611111                     14375   \n",
       "3                   20                   1.200000                        24   \n",
       "4                   14                 663.714286                      9292   \n",
       "\n",
       "   consommation_level_1_min  consommation_level_1_max  \\\n",
       "0                        38                      1200   \n",
       "1                       190                      1207   \n",
       "2                       188                      2400   \n",
       "3                         0                        15   \n",
       "4                       124                       800   \n",
       "\n",
       "   consommation_level_2_mean  consommation_level_2_sum  \\\n",
       "0                  10.571429                       370   \n",
       "1                   0.000000                         0   \n",
       "2                  37.888889                       682   \n",
       "3                   0.000000                         0   \n",
       "4                 104.857143                      1468   \n",
       "\n",
       "   consommation_level_2_min  consommation_level_2_max  \\\n",
       "0                         0                       186   \n",
       "1                         0                         0   \n",
       "2                         0                       682   \n",
       "3                         0                         0   \n",
       "4                         0                       400   \n",
       "\n",
       "   consommation_level_3_mean  consommation_level_3_sum  \\\n",
       "0                   0.000000                         0   \n",
       "1                   0.000000                         0   \n",
       "2                   0.000000                         0   \n",
       "3                   0.000000                         0   \n",
       "4                 117.357143                      1643   \n",
       "\n",
       "   consommation_level_3_min  consommation_level_3_max  \\\n",
       "0                         0                         0   \n",
       "1                         0                         0   \n",
       "2                         0                         0   \n",
       "3                         0                         0   \n",
       "4                         0                       800   \n",
       "\n",
       "   consommation_level_4_mean  consommation_level_4_sum  \\\n",
       "0                   0.000000                         0   \n",
       "1                   0.000000                         0   \n",
       "2                   0.000000                         0   \n",
       "3                   0.000000                         0   \n",
       "4                  36.714286                       514   \n",
       "\n",
       "   consommation_level_4_min  consommation_level_4_max  reading_remarque_mean  \\\n",
       "0                         0                         0               6.971429   \n",
       "1                         0                         0               7.216216   \n",
       "2                         0                         0               7.055556   \n",
       "3                         0                         0               6.150000   \n",
       "4                         0                       382               8.857143   \n",
       "\n",
       "   reading_remarque_min  reading_remarque_max  counter_code_mean  \\\n",
       "0                     6                     9         203.685714   \n",
       "1                     6                     9         203.000000   \n",
       "2                     6                     9         203.222222   \n",
       "3                     6                     9         413.000000   \n",
       "4                     8                     9         207.000000   \n",
       "\n",
       "   counter_code_min  counter_code_max  old_index_min  old_index_max  \\\n",
       "0               203               207           3685          16493   \n",
       "1               203               203           4110          23940   \n",
       "2               203               207          25515          41532   \n",
       "3               413               413             90             99   \n",
       "4               207               207              0          13337   \n",
       "\n",
       "   new_index_min  new_index_max  tarif_type  counter_number  \\\n",
       "0           3809          17078          11         1335667   \n",
       "1           4661          25022          11          678902   \n",
       "2          25974          44614          11          572765   \n",
       "3             90            114          11            2078   \n",
       "4            959          13729          11           19575   \n",
       "\n",
       "   consommation_level_1  consommation_level_2  consommation_level_3  \\\n",
       "0                    82                     0                     0   \n",
       "1                   388                     0                     0   \n",
       "2                   407                     0                     0   \n",
       "3                     0                     0                     0   \n",
       "4                   800                   159                     0   \n",
       "\n",
       "   consommation_level_4  months_number  counter_type  diff_months  \\\n",
       "0                     0              4             0   163.366667   \n",
       "1                     0              2             0   163.766667   \n",
       "2                     0              4             0   164.033333   \n",
       "3                     0              4             0    88.800000   \n",
       "4                     0              4             0    52.833333   \n",
       "\n",
       "   totale_consommation  disrict_60  disrict_62  disrict_63  disrict_69  \n",
       "0                   82           1           0           0           0  \n",
       "1                  388           0           0           0           1  \n",
       "2                  407           0           1           0           0  \n",
       "3                    0           0           0           0           1  \n",
       "4                  959           0           1           0           0  "
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "iJ3KlVOCYfnZ"
   },
   "source": [
    "# Modelling"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "d1qXp8jYY30X"
   },
   "source": [
    "## Train LGBM *Classifier*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {
    "id": "QTr8Wbu1ZKTM"
   },
   "outputs": [],
   "source": [
    "x_train = train.drop(columns=['target'])\n",
    "y_train = train['target']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[LightGBM] [Warning] Accuracy may be bad since you didn't explicitly set num_leaves OR 2^max_depth > num_leaves. (num_leaves=31).\n",
      "[LightGBM] [Warning] Accuracy may be bad since you didn't explicitly set num_leaves OR 2^max_depth > num_leaves. (num_leaves=31).\n",
      "[LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.019641 seconds.\n",
      "You can set `force_col_wise=true` to remove the overhead.\n",
      "[LightGBM] [Info] Total Bins 7661\n",
      "[LightGBM] [Info] Number of data points in the train set: 135493, number of used features: 46\n",
      "[LightGBM] [Info] Start training from score 0.500000\n",
      "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n",
      "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n",
      "[LightGBM] [Warning] No further splits with positive gain, best gain: -inf\n"
     ]
    }
   ],
   "source": [
    "import pandas as pd\n",
    "import lightgbm as lgb\n",
    "from sklearn.model_selection import train_test_split\n",
    "from sklearn.metrics import mean_squared_error\n",
    "\n",
    "\n",
    "# Count the number of samples for each target value (0 and 1)\n",
    "counts = y_train.value_counts()\n",
    "\n",
    "# Calculate the weight for each target value\n",
    "class_weights = {0: 1.0, 1: counts[0] / counts[1]}\n",
    "\n",
    "# Create LightGBM dataset\n",
    "train_data = lgb.Dataset(x_train, label=y_train, weight=y_train.map(class_weights))\n",
    "\n",
    "# Define LightGBM parameters\n",
    "lgb_params = {\n",
    "    'objective': 'regression',  # Regression task\n",
    "    'boosting_type': 'gbdt',\n",
    "    'metric': 'mse',\n",
    "    'learning_rate': 0.01,\n",
    "    'num_leaves': 31,\n",
    "    'max_depth': 7,\n",
    "    'min_child_samples': 20,\n",
    "    'subsample': 0.8,\n",
    "    'colsample_bytree': 0.8,\n",
    "    'n_estimators': 1000,\n",
    "    'random_state': 40\n",
    "}\n",
    "\n",
    "# Train LightGBM model with weighted samples\n",
    "model = lgb.train(lgb_params, train_data)\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "CKsrOiePadZ7"
   },
   "source": [
    "## Make Predictions on test set"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>target</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.342939</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.742117</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.544523</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.176441</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.441177</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     target\n",
       "0  0.342939\n",
       "1  0.742117\n",
       "2  0.544523\n",
       "3  0.176441\n",
       "4  0.441177"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "preds = model.predict(test)\n",
    "preds = pd.DataFrame(preds, columns=['target'])\n",
    "preds.head()"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {},
   "source": [
    "The Score was 0.84"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.16"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
